Services Overview

Service Recommendations

Balanced: Simple to setup, not too expensive

Use NovelAI for Text Gen and Text To Speech, and Deepgram for Speech To Text. You only need two accounts, and the pricing is reasonable. You can also optionally consider using OpenAI for Action Inference and Summarization.

Free: Host everything yourself

Use Text Generation Web UI for Text Gen, Vosk for Speech To Text and Silero for Text To Speech.

Best quality: Expensive, but worth it

Use Azure Speech Service for Speech To Text, ElevenLabs for Text To Speech, and run a large model using Text Generation Web UI on RunPod.

All Supported Services

Service Services Hosting Notes
Anthropic Text Gen Online Third-party large language model provider (Claude-style models).
AssemblyAI Speech To Text Online Online speech-to-text service.
Azure Speech Service Text To Speech, Speech To Text Online The very best speech transcription, their voice synthesizer is fair. Multilingual support. Free tier available.
Azure Wake Word Speech To Text Online Wake-word / keyword spotting using Azure’s speech SDK.
Canopy Labs Orpheus Text To Speech Local An excellent library for advanced Text-to-Speech generation.
Chatterbox Local Chat integration and routing module (not a standalone AI service).
Civitai Online Connects to the Civitai model hub to download and manage models.
ComfyUI Local Integration with ComfyUI for image and multimedia workflows.
Coqui/XTTS Text To Speech Local An excellent library for advanced Text-to-Speech generation.
Deepgram Speech To Text Online A good online speech to text service.
Discord Online Discord bot / integration for using Voxta through Discord.
DuckDuckGo Search Online Privacy-friendly web search provider used as a tool by the assistant.
EchoTTS Text To Speech Local Simple local Text-to-Speech engine.
ElevenLabs Text To Speech Online The very best voice synthesizer available. Multilingual support. Expensive.
ExLlamaV2 Text Gen Local Fast inference library that enables the running of large language models (LLMs) locally.
ExLlamaV3 Text Gen Local Newer ExLlama backend for running large local language models efficiently.
F5-TTS Text To Speech Local An excellent library for advanced Text-to-Speech generation.
FlashCap Local Windows audio/video capture backend used for microphone input.
Florence 2 Online Vision model integration for image understanding (images to text, captions, etc.).
Google AI Text Gen Online Integration with Google’s large language models (for example, Gemini).
KittenTTS Text To Speech Local Local Text-to-Speech engine focused on fast, lightweight voices.
KoboldAI Text Gen Local One of the most popular ways to run your own local large language models.
Kokoro TTS Text To Speech Local High-quality local Text-to-Speech engine.
LlamaSharp Text Gen Local Local LLaMA backend using the LlamaSharp library.
Local Diffusers Local Stable Diffusion / diffusion-model integration for images.
Lovense Local Lovense device integration module.
Microsoft Semantic Kernel Local Semantic Kernel integration for advanced orchestration and tools.
NAudio Local Local audio capture / playback backend for Windows.
NovelAI Text Gen Online Amazing large language model. Paid. Supports English and Japanese.
Text Generation Web UI Text Gen Local One of the most popular ways to run your own local large language models.
OpenAI Text Gen Online The reference for large language models. Supports most languages. Paid. NSFW content is not allowed.
OpenAI Compatible Text Gen Online Reference any openai-compatible service.
OpenRouter Text Gen Online Gateway to multiple private and open source models. Paid. NSFW content is allowed.
OpenTK Local Cross-platform windowing / graphics backend used by some modules.
Sesame Conversational Speech Model Text To Speech Local An excellent library for advanced Text-to-Speech generation.
Silero Text To Speech Local Local speech synthesis. Fair quality.
Tavily Search Online Online web search API used as a tool by the assistant.
Text Generation Inference Text Gen Local HuggingFace’s open source local large language models host.
Text To Speech HTTP API Text To Speech Local Use any text to speech service. You need to configure it yourself.
VibeVoice Text To Speech Local Voice generation module focused on expressive Text-to-Speech.
Voxta Cloud Text Gen, Text To Speech, Speech To Text Online Voxta’s own AI backend, built for Voxta.
Vosk Speech To Text Local Local speech transcription. You can download models for your language. Fair quality.
WhisperLive Speech To Text Local Local speech transcription. You can download models for your language. Excellent quality.
Windows SDK Local Windows SDK integration required by several Windows-specific modules.
Windows Speech Text To Speech, Speech To Text Local Fair quality speech transcription and synthesizer. Supports your installed languages. Censored.
XAI Text Gen Online Integration with xAI models for text generation.

Built-in Modules

Module Category Notes
AudioRmsFilter Audio Filters audio based on audio loudness (RMS) to reduce speech detection resources.
ChainOfThought Reasoning Enables chain-of-thought prompting and reasoning helpers. Experimental.
Continuations Conversation Automatically continues speaking without user interaction.
Documents Knowledge Provides documents updating capabilities.
FolderWatcher File System Watches folders for images and automatically include analyze them.
ModelContextProtocol (Http) Integration Connects to external tools and models over the Model Context Protocol using HTTP.
ModelContextProtocol (Stdio) Integration Connects to external tools and models over the Model Context Protocol using stdio.
ProfanityDetector Safety Detects and filters profanity in messages.
ReplyPrefixing Prompting Adds prefixes to character messages to increase creativity.
SimpleMemory Memory Lightweight key–value memory system for short-term facts.
TextReplacements Prompting Applies simple text replacements and filters before sending messages.
Vision Vision Provides augmentations to run vision models during chats.