Services Overview

Service Recommendations

Balanced: Simple to setup, not too expensive

Use NovelAI for Text Gen and Text To Speech, and Deepgram for Speech To Text. You only need two accounts, and the pricing is reasonable. You can also optionally consider using OpenAI for Action Inference and Summarization.

Free: Host everything yourself

Use Text Generation Web UI for Text Gen, Vosk for Speech To Text and Silero for Text To Speech.

Best quality: Expensive, but worth it

Use Azure Speech Service for Speech To Text, ElevenLabs for Text To Speech, and run a large model using Text Generation Web UI on RunPod.

All Supported Services

Service Text Gen Text To Speech Speech To Text Price Notes
Voxta Cloud ⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐ 💲 Voxta’s own AI backend, built for Voxta.
Deepgram 🚫 🚫 ⭐⭐ 💲 A good online speech to text service.
ExLlamaV2 ⭐⭐⭐ 🚫 🚫 💻 Fast inference library that enables the running of large language models (LLMs) locally.
Coqui/XTTS 🚫 ⭐⭐⭐ 🚫 💻 An excellent library for advanced Text-to-Speech generation.
ElevenLabs 🚫 ⭐⭐⭐ 🚫 💲💲💲 The very best voice synthesizer available. Multilingual support. Expensive.
Azure Speech Service 🚫 ⭐⭐ ⭐⭐⭐ 💲💲💲 The very best speech transcription, their voice synthesizer is fair. Multilingual support. Free tier available.
NovelAI ⭐⭐ 🚫 💲 Amazing text to speech and large language model. Paid. Supports English and Japanese.
OpenAI ⭐⭐⭐ 🚫 🚫 💲💲 The reference for large language models. Supports most languages. Paid. NSFW content is not allowed.
OpenAI Compatible ? 🚫 🚫 ? Reference any openai-compatible service.
Text Generation Web UI ⭐⭐⭐ 🚫 🚫 💻 One of the most popular ways to run your own local large language models.
WhisperLive 🚫 🚫 ⭐⭐⭐ 💻 Local speech transcription. You can download models for your language. Excellent quality.
Vosk 🚫 🚫 💻 Local speech transcription. You can download models for your language. Fair quality.
Text To Speech HTTP API 🚫 ⭐⭐ 🚫 💻 Use any text to speech service. You need to configure it yourself.
Silero 🚫 ⭐⭐ 🚫 💻 Local speech synthesis. Fair quality.
OpenRouter ⭐⭐ 🚫 🚫 💲💲 Gateway to multiple private and open source models. Paid. NSFW content is allowed.
Text Generation Inference 🚫 🚫 💻 HuggingFace’s open source local large language models host. Did not gave great results in our tests, but maybe our implementation is wrong.
KoboldAI ⭐⭐⭐ 🚫 🚫 💻 One of the most popular ways to run your own local large language models.
Windows Speech 🚫 💻 Fair quality speech transcription and synthesizer. Supports your installed languages. Censored.