# Embedding clustering

**Source:** https://promtable.com/glossary/embedding-clustering

> Embedding clustering groups documents, queries, or users by embedding similarity — used for topic discovery, deduplication, semantic indexing, and personalisation.

---
Embedding clustering groups documents, queries, or users by embedding similarity — used for topic discovery, deduplication, semantic indexing, and personalisation.

Once you have embeddings for a corpus, clustering them (k-means, HDBSCAN, agglomerative) reveals natural topic structure without supervision. Common applications in 2026 include: discovering FAQ clusters in support tickets, deduplicating near-identical documents before ingestion, building topic taxonomies for content sites, segmenting users for personalisation, and routing queries to specialised models or agents. Combined with dimensionality reduction (UMAP, t-SNE) for visualisation, embedding clustering is the standard exploratory technique for any embedding-based system.

## When to use

- Topic discovery in unsupervised corpora.
- Document deduplication.
- User / query segmentation for routing.

## Common mistakes

- Clustering without normalising the embeddings — distance metrics misbehave.
- Using k-means when HDBSCAN would fit the topic shape better.

## Related terms

- [embeddings](https://promtable.com/glossary/embeddings)
- [semantic-search](https://promtable.com/glossary/semantic-search)
- [semantic-routing](https://promtable.com/glossary/semantic-routing)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/embedding-clustering
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/embedding-clustering".
Contact: info@vibecodingturkey.com.