concept

Token

A token is the smallest unit a language model reads or writes — typically a sub-word fragment, with one English word averaging about 1.3 tokens.

Language models do not see words; they see tokens produced by a tokenizer (BPE, SentencePiece, tiktoken). For OpenAI's cl100k_base, 1,000 tokens ≈ 750 English words. Tokens are billed asymmetrically: many APIs charge less for input than for output. Non-Latin scripts (Chinese, Arabic, Turkish, Hindi) tokenize more densely — Turkish text can be 1.7–2× tokens per word, which is why localization budgets blow up. Use a tokenizer tool (e.g. our [token counter](https://promtable.com/tools/token-counter)) before committing to long prompts.

Common mistakes

FAQ

What is token?

A token is the smallest unit a language model reads or writes — typically a sub-word fragment, with one English word averaging about 1.3 tokens.

What are the most common mistakes with token?

Comparing prompt cost in words rather than tokens. Ignoring that Turkish, Arabic, and Hindi double your token bill versus English.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/token.md.