Glossary
Voxta terms and AI / ML jargon you'll run into across these docs.
Voxta-specific
Voxta — AI orchestration platform for natural voice and text chat with AI characters.
Voxta Server — the local desktop app that runs orchestration, services, and the web UI.
Voxta Cloud — Voxta's hosted AI backend. Provides LLM, TTS, and STT through a single API.
Voxy — desktop avatar companion that connects to Voxta Server and adds a VRM-rendered face.
Studio — the authoring half of Voxta — characters, scenarios, events, actions, scripts.
Character — an AI persona (name, description, personality, voice, profile, assets).
Scenario — the situation a character operates under: roles, events, actions, contexts, scripts.
Action Inference — Voxta's process of picking which scenario action the AI should fire next, based on conversation context.
Context — sentences added to the prompt just before the AI's reply (e.g. {{ char }} is wearing a blindfold).
Memory Book — supplemental long-form lore associated with a character or scenario, retrieved when relevant keywords match.
Flag — a boolean state bit, used to gate actions and contexts.
App Trigger — a call from a scenario script to the host app (Voxta Talk, Voxta-VAM, Voxy) to change UI state, play audio, swap avatars, etc.
AI / ML jargon
AI (Artificial Intelligence) — broad term for systems that simulate intelligent behavior.
LLM (Large Language Model) — a model trained on huge text corpora that can read and generate language. Powers the character's replies.
TTS (Text-to-Speech) — turns text into spoken audio. Voxta's voice services.
STT (Speech-to-Text) — turns spoken audio into text. Voxta's transcription services.
Token — the basic unit an LLM reads and writes. Roughly a word or part of a word. "Unbelievable" might be three tokens: un + believ + able.
Context window — the maximum number of tokens an LLM can consider at once. Older messages get dropped or summarized when the chat exceeds it.
Summarization — compressing old chat history into a shorter summary so the context window doesn't fill up.
Quantization — compressing a model's weights to reduce its memory footprint. Trade some accuracy for fitting on consumer GPUs.
GGUF / EXL2 / GPTQ / AWQ — quantization formats for local LLMs. See LLM overview.
RAG (Retrieval-Augmented Generation) — feeding an LLM with retrieved-on-demand information instead of stuffing everything into the prompt up front.
RLHF (Reinforcement Learning from Human Feedback) — fine-tuning an LLM using human ratings to align its outputs.
MoE (Mixture of Experts) — an LLM architecture that activates only a subset of its parameters per query.
AGI (Artificial General Intelligence) — hypothetical AI capable of any intellectual task a human can. Aspirational target, not currently shipped.
Adjacent tech
VR (Virtual Reality) — immersive simulated environment via a headset.
VAM (Virt-A-Mate) — 3D character creation and interaction software, designed for VR. Voxta integrates via a plugin.
Mocap (Motion Capture) — recording human movement for animation playback. SMPL is a popular mocap representation.
SMPL / SMPL-H — a 3D body model representing a pose as joint rotations plus a root translation. Used by HyMotion and other text-to-motion models.
VRM (VRoid Model) — a standard 3D humanoid avatar file format. Used by Voxy and many VTubing apps.
Tools and platforms
HF (Hugging Face) — central hub for open-source ML models.
API (Application Programming Interface) — a set of endpoints / functions exposed for other programs to call. An API key authenticates which program is calling.
SDK (Software Development Kit) — a bundle of libraries and tools for building on a platform.
MCP (Model Context Protocol) — open standard for exposing tools and resources to LLMs. Voxta supports MCP via the HTTP and STDIO augmentations.