Glossary

Voxta-specific

Voxta — AI orchestration platform for natural voice and text chat with AI characters.

Voxta Server — the local desktop app that runs orchestration, services, and the web UI.

Voxta Cloud — Voxta's hosted AI backend. Provides LLM, TTS, and STT through a single API.

Voxy — desktop avatar companion that connects to Voxta Server and adds a VRM-rendered face.

Studio — the authoring half of Voxta — characters, scenarios, events, actions, scripts.

Character — an AI persona (name, description, personality, voice, profile, assets).

Scenario — the situation a character operates under: roles, events, actions, contexts, scripts.

Action Inference — Voxta's process of picking which scenario action the AI should fire next, based on conversation context.

Context — sentences added to the prompt just before the AI's reply (e.g. {{ char }} is wearing a blindfold).

Memory Book — supplemental long-form lore associated with a character or scenario, retrieved when relevant keywords match.

Flag — a boolean state bit, used to gate actions and contexts.

App Trigger — a call from a scenario script to the host app (Voxta Talk, Voxta-VAM, Voxy) to change UI state, play audio, swap avatars, etc.

AI / ML jargon

AI (Artificial Intelligence) — broad term for systems that simulate intelligent behavior.

LLM (Large Language Model) — a model trained on huge text corpora that can read and generate language. Powers the character's replies.

TTS (Text-to-Speech) — turns text into spoken audio. Voxta's voice services.

STT (Speech-to-Text) — turns spoken audio into text. Voxta's transcription services.

Token — the basic unit an LLM reads and writes. Roughly a word or part of a word. "Unbelievable" might be three tokens: un + believ + able.

Context window — the maximum number of tokens an LLM can consider at once. Older messages get dropped or summarized when the chat exceeds it.

Summarization — compressing old chat history into a shorter summary so the context window doesn't fill up.

Quantization — compressing a model's weights to reduce its memory footprint. Trade some accuracy for fitting on consumer GPUs.

GGUF / EXL2 / GPTQ / AWQ — quantization formats for local LLMs. See LLM overview.

RAG (Retrieval-Augmented Generation) — feeding an LLM with retrieved-on-demand information instead of stuffing everything into the prompt up front.

RLHF (Reinforcement Learning from Human Feedback) — fine-tuning an LLM using human ratings to align its outputs.

MoE (Mixture of Experts) — an LLM architecture that activates only a subset of its parameters per query.

AGI (Artificial General Intelligence) — hypothetical AI capable of any intellectual task a human can. Aspirational target, not currently shipped.

Adjacent tech

VR (Virtual Reality) — immersive simulated environment via a headset.

VAM (Virt-A-Mate) — 3D character creation and interaction software, designed for VR. Voxta integrates via a plugin.

Mocap (Motion Capture) — recording human movement for animation playback. SMPL is a popular mocap representation.

SMPL / SMPL-H — a 3D body model representing a pose as joint rotations plus a root translation. Used by HyMotion and other text-to-motion models.

VRM (VRoid Model) — a standard 3D humanoid avatar file format. Used by Voxy and many VTubing apps.

Tools and platforms

HF (Hugging Face) — central hub for open-source ML models.

API (Application Programming Interface) — a set of endpoints / functions exposed for other programs to call. An API key authenticates which program is calling.

SDK (Software Development Kit) — a bundle of libraries and tools for building on a platform.

MCP (Model Context Protocol) — open standard for exposing tools and resources to LLMs. Voxta supports MCP via the HTTP and STDIO augmentations.

Voxta-specific

AI / ML jargon

Adjacent tech

Tools and platforms

On this page