Voxta docs

F5-TTS

Local TTS using a Diffusion Transformer with ConvNeXt V2 — good quality and reasonable speed.

F5-TTS is an open-source TTS using Diffusion Transformer architecture with ConvNeXt V2. Faster training and inference compared to older diffusion TTS models.

Setup

Add the service

Manage Services → + Add Services → F5-TTS → Add. Voxta installs the Python runtime and model weights automatically on first use.

Pick a voice

In the F5-TTS config, browse available voices. You can also upload a reference clip for voice cloning.

On this page