technique

Vector RAG

Vector RAG is the classic retrieval-augmented generation pattern — embed documents, store in a vector DB, retrieve by query embedding similarity, inject top-K into the prompt — vs. graph RAG, in-context RAG, or hybrid RAG.

Vector RAG is the default RAG pattern in 2026 production. The pipeline: chunk documents, embed with a strong model (text-embedding-3-large, Voyage, Cohere embed-v3, BGE-M3), store in a vector DB (Pinecone, Weaviate, Qdrant, pgvector, Chroma), embed the user query at runtime, retrieve top-K by cosine similarity, re-rank with a cross-encoder, inject the survivors into the prompt with explicit document IDs. Hybrid retrieval (vector + BM25) is the production default in 2026 because pure vector search misses exact-match queries. Variants: graph RAG (Microsoft GraphRAG, contextual retrieval), in-context RAG (paste corpus into long-context model), speculative RAG.

When to use vector rag

Common mistakes

FAQ

What is vector rag?

Vector RAG is the classic retrieval-augmented generation pattern — embed documents, store in a vector DB, retrieve by query embedding similarity, inject top-K into the prompt — vs. graph RAG, in-context RAG, or hybrid RAG.

When should I use vector rag?

RAG over corpora past ~50K tokens. Multi-tenant systems where corpora differ per user. Production retrieval where update frequency is high.

What are the most common mistakes with vector rag?

Pure vector without BM25 — misses exact-match queries. Skipping a cross-encoder reranker — first-stage retrieval is noisy.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/vector-rag.md.