technique

Speaker diarisation

Speaker diarisation is the technique of segmenting an audio recording by who-spoke-when — answering "who said what" rather than just "what was said" — used heavily in meeting transcription, podcasts, and call analytics.

Diarisation segments audio into per-speaker turns: "Speaker 1: hello", "Speaker 2: hi". Production STT platforms (Deepgram, AssemblyAI, Google STT) ship diarisation as a configurable option. Quality varies — clean two-speaker calls are well-handled; messy multi-speaker meetings with overlapping speech remain hard. Pair with speaker identification (matching a diarised speaker to a known voice from a sample) for full speaker labels. Used in meeting summaries (Otter, Read.ai, Granola), call centre analytics, podcast transcription, and forensic audio analysis.

When to use speaker diarisation

Common mistakes

FAQ

What is speaker diarisation?

Speaker diarisation is the technique of segmenting an audio recording by who-spoke-when — answering "who said what" rather than just "what was said" — used heavily in meeting transcription, podcasts, and call analytics.

When should I use speaker diarisation?

Meeting transcription. Podcast / interview transcription. Call centre analytics.

What are the most common mistakes with speaker diarisation?

Expecting accurate diarisation on heavy overlapping speech — current models struggle. Not pairing with speaker identification — "Speaker 3" labels are useless without names.

Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/diarisation.md.