Deepmind’s StarCraft 2 AI creates strategy statistics with a hand-crafted compilation method, no learning involved.

example

From each replay, we extract a statistic z that encodes each player’s build order defined as the first 20 constructed buildings and units, and cumulative statistics, defined as the units, buildings, effects, and upgrades that were present during a game.

During supervised learning

During reinforcement learning

When a new agent is created, it is supervised based off that statistic throughout its run, i.e. the agent is penalized for not following the statistics.

One example is build order. If the agent ends up choosing a different build order, then it its penalized strategy statistic’s build order, it is penalized with a hand-crafted loss.