Neural TTS
Neural TTS is the modern (2017+) generation of text-to-speech using neural networks (Tacotron, WaveNet, FastSpeech, VITS, GPT-SoVITS) — the foundation behind every consumer TTS that doesn't sound robotic. Replaced concatenative + parametric TTS by ~2020.
Pre-neural TTS sounded robotic (concatenative splices recorded fragments; parametric synthesizes from features). Neural TTS uses deep learning to map text directly to mel spectrograms or audio waveforms — natural prosody, expression, and pacing. Architectures: Tacotron + WaveNet (early), FastSpeech 2 (faster), VITS (end-to-end), GPT-SoVITS / XTTS (zero-shot voice clone), Voicebox (Meta, controllable). Production 2026 TTS APIs (Google Chirp 3, Azure Neural, ElevenLabs, Cartesia Sonic, OpenAI TTS, Edge TTS) all use neural architectures. The user-visible benefits: natural prosody, multilingual coverage, voice cloning, emotion + style control, sub-second streaming latency.
When to use neural tts
- Any production TTS today.
Common mistakes
- Using non-neural TTS in 2026 — sounds dated and limits cloning / style features.
FAQ
What is neural tts?
Neural TTS is the modern (2017+) generation of text-to-speech using neural networks (Tacotron, WaveNet, FastSpeech, VITS, GPT-SoVITS) — the foundation behind every consumer TTS that doesn't sound robotic. Replaced concatenative + parametric TTS by ~2020.
When should I use neural tts?
Any production TTS today.
What are the most common mistakes with neural tts?
Using non-neural TTS in 2026 — sounds dated and limits cloning / style features.
Related terms
- Voice cloning — Voice cloning takes a sample of someone speaking — sometimes as little as 30 seconds — and produces a model that can synthesise new speech in that voice.
- Voice design — Voice design is the TTS feature where users describe a desired voice in prose ('warm, deep, 40-year-old male, slight British accent') and the system generates a synthetic voice matching the description — ElevenLabs Voice Design, OpenAI's instruction-guided TTS are 2026 examples.
- Voice marketplace — A voice marketplace is the curated library of synthetic voices a TTS platform offers — community-uploaded, vendor-curated, or licensed-from-actors. ElevenLabs, PlayHT, Resemble, Cartesia all maintain voice marketplaces in 2026.
Last updated: 2026-06-01. Raw markdown: https://promtable.com/glossary/neural-tts.md.