Uses in AlphaStar

Given learning agent A, sample frozen opponent B from set of canidates C with probability

f(P[A beats B])DCf(P[A beats D])\frac{f(P[A \ \text{beats} \ B])}{\sum_{D\in C}f(P[A\ \text{beats}\ D])}

Where f:[0,1][0,)f: [0,1] \rightarrow [0,\infty) is some weighting function

Choosing fhard(x)=(1x)pf_{\text{hard}}(x) = (1-x)^p make PFSP focus on the hardest players where pRp \in R scales the distribution.

Choosing fvar=x(1x)f_{\text{var}} = x(1-x) means that the agent prefers opponents around its own level, which is best for agents which are much weaker overall than the strongest agents.

Criticism

In the Transitive Cyclic Decomposition paper, this method is criticized because it contracts the space of agents rather than expands it. More precisely, it lowers the Effective diversity of the strategies.

In particular, in the disk game, this method contracts the space of solutions instead of growing them (this method is version C in the diagram).

game_response

They instead propose Response to Rectified Nash to grow the landscape.