Large Language Models
Text generation providers — the brain behind every character reply, across cloud and self-hosted options.
The LLM is the AI's brain. It handles three jobs in Voxta:
- Reply — the conversation text the character speaks.
- Action Inference — picking which action to fire.
- Summarization — compressing old chat history into manageable summaries.
You can route all three through one service or split them across different ones (e.g. a high-quality model for Reply, a cheap fast model for Action Inference and Summarization).
The Voxta UI groups LLMs into three hosting tiers — these docs mirror that.
Cloud-Based
External companies. Need an API key. Pay per use.
OpenAI
GPT family.
Anthropic
Claude family.
OpenRouter
Gateway to dozens of models behind one key.
Gemini via OpenAI compatibility.
xAI (Grok)
Grok models via OpenAI compatibility.
NovelAI
Subscription LLM tuned for storytelling.
Mistral AI
Mistral's hosted models.
Voxta Cloud
Voxta's own hosted LLM.
Self-Hosted: Zero-Setup
Voxta installs the runtime and Python dependencies automatically on first use. You only provide a model.
llama.cpp
GGUF models, CPU + GPU.
LlamaSharp
In-process .NET llama.cpp binding.
ExLlamaV2
GPTQ / EXL2 quantized models.
ExLlamaV3
Newer ExLlama backend (experimental).
Self-Hosted: Requires External Software
You install and run the upstream LLM server separately. Voxta connects to its API.