RL Gridworld

Algorithm:

Discount (γ): 0.9

Learning Rate (α): 0.1

Iterations: 0

Max ΔV: -

Converged: No

Reinforcement Learning

🟢 Goal (+10 reward)
🔴 Pit (-10 reward)
⬛ Wall (blocked)

Arrows show optimal policy.
Colors show state values.