Voice cloning
Voice cloning takes a sample of someone speaking — sometimes as little as 30 seconds — and produces a model that can synthesise new speech in that voice.
Voice cloning quality in 2026 (ElevenLabs Instant + Professional cloning, Cartesia voice cloning, Resemble AI) is high enough to fool casual listeners on short clips. Used commercially for audiobook narration with permissioned cloning, brand voice consistency, accessibility (cloning a user's voice for assistive tech), and localised dubbing. Legal and ethical guardrails matter: most providers require explicit consent statements from the voice owner, embed inaudible watermarks, and document chain of custody. Unauthorised cloning is increasingly criminalised globally. The category will likely see further regulation in 2026-2027.
When to use voice cloning
- Audiobook narration with consent.
- Brand voice consistency.
- Localised dubbing.
Common mistakes
- Skipping explicit consent collection — legal exposure is rising globally.
- Treating watermarks as proof of authenticity — they prove origin, not safety.
FAQ
What is voice cloning?
Voice cloning takes a sample of someone speaking — sometimes as little as 30 seconds — and produces a model that can synthesise new speech in that voice.
When should I use voice cloning?
Audiobook narration with consent. Brand voice consistency. Localised dubbing.
What are the most common mistakes with voice cloning?
Skipping explicit consent collection — legal exposure is rising globally. Treating watermarks as proof of authenticity — they prove origin, not safety.
Related terms
- Voice (LLM apps) — Voice in LLM apps refers to the full speech pipeline — speech-to-text (STT), language model, text-to-speech (TTS) — that lets users converse with an AI assistant in spoken language.
- AI watermarking — AI watermarking embeds invisible-to-humans signals in model output (text, image, audio, video) so the content can later be detected as AI-generated.
- Deepfake — A deepfake is synthetic media — image, audio, or video — that depicts a real person doing or saying something they did not actually do, produced by AI generation or face/voice swap.
- Content provenance — Content provenance is cryptographic metadata attached to media that records how it was created, by whom or what model, and what edits it has been through.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/voice-cloning.md.