Voxta Server Voxta Cloud Integrations

Using Voxta

Interface Guide

Language

Large Language Models

Voice

Cloud-Based

ElevenLabs Azure Speech Service

Self-Hosted: Zero-Setup

Orpheus Chatterbox TTS XTTS (Coqui)Echo-TTS F5-TTS Kitten TTS Kokoro Sesame Conversational Speech Model Windows Speech

Self-Hosted: External

Remote TTS (HTTP API)

Vision & Generation

Computer Vision

Image Generation

Knowledge

Augmentations

Chat Augmentations

Motion & Hardware

Other

Voxta Cloud (as a service)Discord

Reference

Articles & Concepts

Troubleshooting

Voxta Server Services Text-to-Speech

F5-TTS

Local TTS using a Diffusion Transformer with ConvNeXt V2 — good quality and reasonable speed.

F5-TTS is an open-source TTS using Diffusion Transformer architecture with ConvNeXt V2. Faster training and inference compared to older diffusion TTS models.

Setup

Add the service

Manage Services → + Add Services → F5-TTS → Add. Voxta installs the Python runtime and model weights automatically on first use.

Pick a voice

In the F5-TTS config, browse available voices. You can also upload a reference clip for voice cloning.

Echo-TTS

Local TTS engine with zero-setup install.

Kitten TTS

Lightweight 15M-parameter open-source TTS — runs on modest hardware.

On this page

Setup Add the service Pick a voice