← Back to Simulations

🎯 Reinforcement Learning

Agent learns optimal behavior through trial and error. Uses Q-learning to navigate a grid world, maximizing rewards while avoiding penalties.

Generation: 0
Episodes: 0
Total Reward: 0
Avg Steps: 0
Success Rate: 0%
Q-Learning:
• Q(s,a) ← Q(s,a) + α[r + γ max Q(s',a') - Q(s,a)]
α: Learning rate
γ: Future reward discount
ε: Exploration vs exploitation