technique

Citation extraction

Citation extraction is the technique of attaching source URLs to LLM-generated claims — critical for AI search products to be trustworthy and to give users a way to verify generated content.

Generative AI without citations is a black box; users can't verify, can't follow up, can't trust. Citation extraction makes the model emit source URLs alongside generated text, either inline (`The capital is Paris [^1]`) or as a structured `citations` field. Implementations: (1) constrained decoding to force citation tokens, (2) post-hoc retrieval matching generated claims to retrieved chunks (BM25 / vector / NLI), (3) tool-use protocols where the search tool returns sources the model must reference. Production leaders: Perplexity, ChatGPT Search, Claude with web search, Gemini AI Overview — all attach citations. Quality matters: a citation that doesn't support the claim is worse than no citation. [[grounding]] is the broader practice this fits into.

When to use citation extraction

Common mistakes

FAQ

What is citation extraction?

Citation extraction is the technique of attaching source URLs to LLM-generated claims — critical for AI search products to be trustworthy and to give users a way to verify generated content.

When should I use citation extraction?

AI search products. Anything where the user needs to verify claims.

What are the most common mistakes with citation extraction?

Attaching the search tool's first result regardless of relevance — citations don't support claims. Skipping citation in B2C 'just give the answer' products — trust collapses on first hallucination.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/citation-extraction.md.