Speaker diarisation
Speaker diarisation is the technique of segmenting an audio recording by who-spoke-when — answering "who said what" rather than just "what was said" — used heavily in meeting transcription, podcasts, and call analytics.
Diarisation segments audio into per-speaker turns: "Speaker 1: hello", "Speaker 2: hi". Production STT platforms (Deepgram, AssemblyAI, Google STT) ship diarisation as a configurable option. Quality varies — clean two-speaker calls are well-handled; messy multi-speaker meetings with overlapping speech remain hard. Pair with speaker identification (matching a diarised speaker to a known voice from a sample) for full speaker labels. Used in meeting summaries (Otter, Read.ai, Granola), call centre analytics, podcast transcription, and forensic audio analysis.
When to use speaker diarisation
- Meeting transcription.
- Podcast / interview transcription.
- Call centre analytics.
Common mistakes
- Expecting accurate diarisation on heavy overlapping speech — current models struggle.
- Not pairing with speaker identification — "Speaker 3" labels are useless without names.
FAQ
What is speaker diarisation?
Speaker diarisation is the technique of segmenting an audio recording by who-spoke-when — answering "who said what" rather than just "what was said" — used heavily in meeting transcription, podcasts, and call analytics.
When should I use speaker diarisation?
Meeting transcription. Podcast / interview transcription. Call centre analytics.
What are the most common mistakes with speaker diarisation?
Expecting accurate diarisation on heavy overlapping speech — current models struggle. Not pairing with speaker identification — "Speaker 3" labels are useless without names.
Related terms
- Voice (LLM apps) — Voice in LLM apps refers to the full speech pipeline — speech-to-text (STT), language model, text-to-speech (TTS) — that lets users converse with an AI assistant in spoken language.
- Streaming STT — Streaming STT (speech-to-text) emits partial transcriptions as the user speaks — instead of waiting for end-of-utterance — enabling sub-second response from a voice assistant.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/diarisation.md.