F5-TTS
Local TTS using a Diffusion Transformer with ConvNeXt V2 — good quality and reasonable speed.
F5-TTS is an open-source TTS using Diffusion Transformer architecture with ConvNeXt V2. Faster training and inference compared to older diffusion TTS models.
Setup
Add the service
Manage Services → + Add Services → F5-TTS → Add. Voxta installs the Python runtime and model weights automatically on first use.
Pick a voice
In the F5-TTS config, browse available voices. You can also upload a reference clip for voice cloning.