← Back to Gallery

Transformer Attention Visualizer

Explore how a "Mini-GPT" model attends to different parts of the input sequence.

Attention Heads (Layer 1)

Select a head to see its specific attention pattern. Different heads often learn different syntactic or semantic relationships.

Details

Hover over a token to see which previous tokens it attends to. Brighter lines indicate stronger attention weights.