# Attention mechanism

**Source:** https://promtable.com/glossary/attention-mechanism

> The attention mechanism is the transformer building block that lets each token in an input weight the importance of every other token when computing its representation — the core technique that made modern LLMs possible.

---
The attention mechanism is the transformer building block that lets each token in an input weight the importance of every other token when computing its representation — the core technique that made modern LLMs possible.

Introduced in "Attention Is All You Need" (Vaswani et al., 2017), attention computes a weighted sum over the input tokens for each position. Multi-head attention runs many parallel attention computations with different learned projections. Variants relevant in 2026: grouped-query attention (reduced KV memory), multi-head latent attention (DeepSeek), sliding-window attention (Mistral), and Mixture-of-Depths (Google) which routes which tokens get full attention. All modern LLMs are built on attention; understanding it is the prerequisite for any serious work on efficiency, long context, or interpretability.

## Common mistakes

- Treating attention as a black box — engineering long-context, fast inference, and interpretability all require understanding it.

## Related terms

- [context-window](https://promtable.com/glossary/context-window)
- [kv-cache](https://promtable.com/glossary/kv-cache)
- [mixture-of-experts](https://promtable.com/glossary/mixture-of-experts)

## Sources

- [Attention Is All You Need (arXiv)](https://arxiv.org/abs/1706.03762)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/attention-mechanism
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/attention-mechanism".
Contact: info@vibecodingturkey.com.