# Neural TTS

**Source:** https://promtable.com/glossary/neural-tts

> Neural TTS is the modern (2017+) generation of text-to-speech using neural networks (Tacotron, WaveNet, FastSpeech, VITS, GPT-SoVITS) — the foundation behind every consumer TTS that doesn't sound robotic. Replaced concatenative + parametric TTS by ~2020.

---
Neural TTS is the modern (2017+) generation of text-to-speech using neural networks (Tacotron, WaveNet, FastSpeech, VITS, GPT-SoVITS) — the foundation behind every consumer TTS that doesn't sound robotic. Replaced concatenative + parametric TTS by ~2020.

Pre-neural TTS sounded robotic (concatenative splices recorded fragments; parametric synthesizes from features). Neural TTS uses deep learning to map text directly to mel spectrograms or audio waveforms — natural prosody, expression, and pacing. Architectures: Tacotron + WaveNet (early), FastSpeech 2 (faster), VITS (end-to-end), GPT-SoVITS / XTTS (zero-shot voice clone), Voicebox (Meta, controllable). Production 2026 TTS APIs (Google Chirp 3, Azure Neural, ElevenLabs, Cartesia Sonic, OpenAI TTS, Edge TTS) all use neural architectures. The user-visible benefits: natural prosody, multilingual coverage, voice cloning, emotion + style control, sub-second streaming latency.

## When to use

- Any production TTS today.

## Common mistakes

- Using non-neural TTS in 2026 — sounds dated and limits cloning / style features.

## Related terms

- [voice-cloning](https://promtable.com/glossary/voice-cloning)
- [voice-design](https://promtable.com/glossary/voice-design)
- [voice-marketplace](https://promtable.com/glossary/voice-marketplace)

*Last updated: 2026-06-01*
---

Original page: https://promtable.com/glossary/neural-tts
Maintained by Promtable (https://promtable.com). Content: CC BY 4.0. Cite as "Promtable — https://promtable.com/glossary/neural-tts".
Contact: info@vibecodingturkey.com.