Explore how a "Mini-GPT" model attends to different parts of the input sequence.
Select a head to see its specific attention pattern. Different heads often learn different syntactic or semantic relationships.
Hover over a token to see which previous tokens it attends to. Brighter lines indicate stronger attention weights.