Text-to-Speech
Voice synthesis providers — turn the character's reply into spoken audio.
TTS services synthesize the character's text into spoken audio. Voxta supports cloud TTS (best quality), zero-setup local TTS (Voxta installs models automatically), and external TTS via the generic HTTP API.
Several LLM providers also offer TTS (OpenAI, NovelAI). Their TTS is documented on the LLM page since the same API key serves both — see OpenAI, NovelAI, and Voxta Cloud.
Cloud-Based
ElevenLabs
Industry-leading voice synthesis. Multilingual, expressive.
Azure Speech Service
Microsoft Azure Cognitive Services TTS.
Self-Hosted: Zero-Setup
Voxta installs the Python runtime and model weights automatically on first use.
Orpheus
LLM-based TTS with emotions and disfluencies.
Chatterbox TTS
Diffusion Transformer-based TTS.
XTTS (Coqui)
Multilingual, voice-cloning capable.
Echo-TTS
Local TTS engine.
F5-TTS
Diffusion Transformer with ConvNeXt V2.
Kitten TTS
Lightweight 15M-parameter open-source TTS.
Kokoro
Frontier TTS at 82M parameters.
Sesame CSM
Sesame's conversational speech model.
Windows Speech
Built-in Windows TTS.