← Back to AlphaEvolve MARL
Uniform
Nash Equilibrium
SHOR-PSRO (AlphaEvolve)
Policy (size=weight)
Best Response
Meta-Strategy Centroid
Annealing Schedule
Lambda decay over PSRO iterations
Current Mix Ratio
ORM
Softmax
RPSLS Payoff Matrix
Uniform
Nash
PRD
SHOR-PSRO
Step
Auto-Play
Speed
Iteration: 0 / 50
Reset