# Vector quantization

**Source:** https://promtable.com/glossary/vector-quantization

> Vector quantization is the technique of compressing embedding vectors (e.g., float32 → int8 or binary) to cut memory + disk + bandwidth by 4-32× with small recall loss — essential for billion-vector workloads in 2026.

---
Vector quantization is the technique of compressing embedding vectors (e.g., float32 → int8 or binary) to cut memory + disk + bandwidth by 4-32× with small recall loss — essential for billion-vector workloads in 2026.

A 1536-dim float32 embedding takes 6KB; at 1B vectors that's 6TB of RAM. Quantization shrinks this: scalar quantization (float32 → int8, 4× smaller, ~99% recall), product quantization (PQ, codebook-based, 8-32× smaller, ~95% recall), binary quantization (1 bit/dim, 32× smaller, ~85% recall but viable with reranking). Modern vector DBs (Qdrant, Pinecone, Milvus, LanceDB) ship quantization as a config knob. Production pattern: store quantized vectors for the ANN search, retrieve top-K, rerank with full-precision vectors or with a [[reranker]]. This pipeline gives < 5ms p99 query latency on billion-vector workloads with 99%+ recall@10 — impossible without quantization.

## When to use

- Vector DBs above 10M vectors.
- Memory-constrained / disk-constrained deployments.

## Common mistakes

- Quantizing without reranking — recall loss compounds.
- Picking binary quantization for tiny vectors — savings don't justify recall hit.

## Related terms

- [ann-index](https://promtable.com/glossary/ann-index)
- [embeddings](https://promtable.com/glossary/embeddings)
- [reranker](https://promtable.com/glossary/reranker)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/vector-quantization
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/vector-quantization".
Contact: info@vibecodingturkey.com.