Services
Every AI provider and integration that plugs into Voxta — LLMs, TTS, STT, vision, search, and more.
Voxta talks to AI services through a plugin system. Each service is its own module — Voxta Server loads them at startup and they show up in the Add Service catalog. You install what you need, configure it, and Voxta routes the right service for each part of a chat.
What you need for a working chat
At minimum:
| Role | What it does | Pick from |
|---|---|---|
| Text Generation (LLM) | Generates the character's replies. Also handles Action Inference and Summarization unless you split them out. | Large Language Models |
| Text-to-Speech (TTS) | Speaks the reply. | Text-to-Speech |
| Speech-to-Text (STT) | Transcribes your microphone. | Speech-to-Text |
Everything else is optional.
Service categories
Large Language Models
Text generation, action inference, summarization.
Text-to-Speech
Voice synthesis.
Speech-to-Text
Voice transcription.
Wake Word
Hands-free activation by name.
Computer Vision
Image and screenshot understanding (mostly an LLM capability).
Image Generation
Generate images in chat.
Vision Capture
Webcam and screen capture sources for vision.
Memory
Long-term memory backends.
Web Search
Web lookup tools.
Chat Augmentations
Voxta Utilities + game / device integrations (Lovense, Elite Dangerous, MCP).
Animation
Body motion / gesture generation.
Audio I/O
Low-level audio input, output, and conversion.