# Token

**Source:** https://promtable.com/glossary/token

> A token is the smallest unit a language model reads or writes — typically a sub-word fragment, with one English word averaging about 1.3 tokens.

---
A token is the smallest unit a language model reads or writes — typically a sub-word fragment, with one English word averaging about 1.3 tokens.

Language models do not see words; they see tokens produced by a tokenizer (BPE, SentencePiece, tiktoken). For OpenAI's cl100k_base, 1,000 tokens ≈ 750 English words. Tokens are billed asymmetrically: many APIs charge less for input than for output. Non-Latin scripts (Chinese, Arabic, Turkish, Hindi) tokenize more densely — Turkish text can be 1.7–2× tokens per word, which is why localization budgets blow up. Use a tokenizer tool (e.g. our [token counter](https://promtable.com/tools/token-counter)) before committing to long prompts.

## Common mistakes

- Comparing prompt cost in words rather than tokens.
- Ignoring that Turkish, Arabic, and Hindi double your token bill versus English.

## Related terms

- [context-window](https://promtable.com/glossary/context-window)
- [tokenizer](https://promtable.com/glossary/tokenizer)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/token
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/token".
Contact: info@vibecodingturkey.com.