Vector RAG
Vector RAG is the classic retrieval-augmented generation pattern — embed documents, store in a vector DB, retrieve by query embedding similarity, inject top-K into the prompt — vs. graph RAG, in-context RAG, or hybrid RAG.
Vector RAG is the default RAG pattern in 2026 production. The pipeline: chunk documents, embed with a strong model (text-embedding-3-large, Voyage, Cohere embed-v3, BGE-M3), store in a vector DB (Pinecone, Weaviate, Qdrant, pgvector, Chroma), embed the user query at runtime, retrieve top-K by cosine similarity, re-rank with a cross-encoder, inject the survivors into the prompt with explicit document IDs. Hybrid retrieval (vector + BM25) is the production default in 2026 because pure vector search misses exact-match queries. Variants: graph RAG (Microsoft GraphRAG, contextual retrieval), in-context RAG (paste corpus into long-context model), speculative RAG.
When to use vector rag
- RAG over corpora past ~50K tokens.
- Multi-tenant systems where corpora differ per user.
- Production retrieval where update frequency is high.
Common mistakes
- Pure vector without BM25 — misses exact-match queries.
- Skipping a cross-encoder reranker — first-stage retrieval is noisy.
FAQ
What is vector rag?
Vector RAG is the classic retrieval-augmented generation pattern — embed documents, store in a vector DB, retrieve by query embedding similarity, inject top-K into the prompt — vs. graph RAG, in-context RAG, or hybrid RAG.
When should I use vector rag?
RAG over corpora past ~50K tokens. Multi-tenant systems where corpora differ per user. Production retrieval where update frequency is high.
What are the most common mistakes with vector rag?
Pure vector without BM25 — misses exact-match queries. Skipping a cross-encoder reranker — first-stage retrieval is noisy.
Related terms
- Retrieval-augmented generation (RAG) — Retrieval-augmented generation (RAG) injects relevant documents into the prompt at query time so the model answers from your data instead of its training memory.
- Embeddings — Embeddings are dense numeric vectors that represent the meaning of text, images, or other data, allowing similarity to be measured as vector distance.
- Vector database — A vector database stores embeddings and performs approximate nearest-neighbor search at scale, the persistence layer behind RAG and semantic search.
- Hybrid search (retrieval) — Hybrid search combines dense vector retrieval with sparse keyword (BM25) retrieval, then fuses the two ranked lists — the production retrieval default for RAG in 2026.
- Graph RAG — Graph RAG builds a knowledge graph from the corpus during ingestion — entities, relationships, facts — and retrieves via graph traversal alongside vector search, improving recall on relational queries.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/vector-rag.md.