Services

Every AI provider and integration that plugs into Voxta — LLMs, TTS, STT, vision, search, and more.

Voxta talks to AI services through a plugin system. Each service is its own module — Voxta Server loads them at startup and they show up in the Add Service catalog. You install what you need, configure it, and Voxta routes the right service for each part of a chat.

What you need for a working chat

At minimum:

Role	What it does	Pick from
Text Generation (LLM)	Generates the character's replies. Also handles Action Inference and Summarization unless you split them out.	Large Language Models
Text-to-Speech (TTS)	Speaks the reply.	Text-to-Speech
Speech-to-Text (STT)	Transcribes your microphone.	Speech-to-Text

Everything else is optional.

Service categories

Large Language Models

Text generation, action inference, summarization.

Text-to-Speech

Voice synthesis.

Speech-to-Text

Voice transcription.

Wake Word

Hands-free activation by name.

Computer Vision

Image and screenshot understanding (mostly an LLM capability).

Image Generation

Generate images in chat.

Vision Capture

Webcam and screen capture sources for vision.

Memory

Long-term memory backends.

Web Search

Web lookup tools.

Chat Augmentations

Voxta Utilities + game / device integrations (Lovense, Elite Dangerous, MCP).

Animation

Body motion / gesture generation.

Audio I/O

Low-level audio input, output, and conversion.

Scripting

JavaScript event handlers that make scenarios react and adapt — the full chat, character, and event API.

Large Language Models

Text generation providers — the brain behind every character reply, across cloud and self-hosted options.

On this page

What you need for a working chat Service categories