← Back to Gallery
RL Gridworld
Algorithm:
Value Iteration
Policy Iteration
Q-Learning
Discount (γ):
0.9
Learning Rate (α):
0.1
Reset
Iteration Step
Run to Convergence
Run Episode (Q)
Iterations:
0
Max ΔV:
-
Converged:
No
Reinforcement Learning
🟢 Goal (+10 reward)
🔴 Pit (-10 reward)
⬛ Wall (blocked)
Arrows show optimal policy.
Colors show state values.