Service types

When you register a module, you declare what kind(s) of service it provides via the Supports dictionary. This page lists every supported value and what to use it for.

The `Supports` dictionary

builder.Register(new ServiceDefinition
{
    // ...
    Supports = new Dictionary<ServiceTypes, ServiceDefinitionCategoryScore>
    {
        { ServiceTypes.ChatAugmentations, ServiceDefinitionCategoryScore.High }
    }
});

Each entry says: "I can act as this kind of service, with this confidence level." A single module can support multiple service types — for example, a vision provider that does both image generation and image understanding declares both.

The score

ServiceDefinitionCategoryScore is a relative weight Voxta uses to rank candidate services for a slot. Values: Low, Medium, High. Use High if your module is purpose-built for this service type, Low if it's a side capability.

Catalog

`ServiceTypes` value	What it does	Typical use case
`TextGen`	Generates assistant replies from prompts.	OpenAI, Anthropic, llama.cpp, KoboldCpp, Ollama.
`ActionInference`	Decides which semantic action the user is triggering (run alongside `TextGen`).	Usually delegated to a fast LLM. Configure separately from main chat LLM for cost / speed.
`Summarization`	Condenses long chat histories into memory.	A cheap LLM endpoint kept separate from the main chat LLM.
`TextToSpeech`	Renders text to audio.	ElevenLabs, Azure TTS, Coqui, Piper, SAPI.
`SpeechToText`	Transcribes microphone audio.	Vosk, Whisper, Azure Speech, Google STT.
`AudioInput`	Raw microphone capture.	Custom audio backends, virtual mics, network audio.
`AudioOutput`	Speaker playback.	Custom output routing, network audio.
`AudioPipeline`	Audio processing (noise suppression, gain, format conversion).	DSP plugins, sample-rate converters.
`WakeWord`	Listens for a hotword to start a chat turn.	Picovoice Porcupine, openWakeWord.
`VisionCapture`	Captures images from a source (screen, webcam, game window).	Screen grabbers, OBS bridges.
`ComputerVision`	Understands images — describes scenes, identifies objects, reads text.	GPT-4o vision, Claude vision, local vision models.
`ChatAugmentations`	Injects context into the chat and exposes semantic actions the LLM can trigger.	In-process game and app integrations — anything you can drive from .NET without needing another runtime. Elite Dangerous COVAS is the reference example.
`Memory`	Long-term knowledge store the chat can query.	Vector databases, RAG backends.
`ImageGen`	Generates images from text.	Stable Diffusion (local), DALL·E, Imagen.
`Animations`	Drives avatar / character motion from text.	Future — currently used for the HY-Motion experimental motion service.

Choosing the right type

A few rules of thumb:

Wrapping an external AI API? Pick the corresponding category — TextGen for an LLM, TextToSpeech for a voice, etc. The framework gives you a ServiceBase to inherit from and an interface to implement.
Adding context to chats? ChatAugmentations. This is the most flexible category — you get callbacks for events, can inject text into the prompt, and can register semantic actions the LLM calls.
Capturing the screen or webcam? VisionCapture. Pair it with a ComputerVision service to describe what was captured.
Bridging a game or app? ChatAugmentations is usually the right home, since the integration's job is to feed game state into chat and translate chat output into game actions.

Registering the implementation

After declaring Supports, you tell the builder which class implements each type:

builder.AddTextGenService<MyTextGen>("my-service");
builder.AddChatAugmentationsService<MyAugmentations>("my-service");
builder.AddTextToSpeechService<MyTts>("my-service");
// etc.

The string is your ServiceName from the ServiceDefinition. The builder has a registration method for every service type — Add{ServiceType}Service<TImpl>(string serviceName). Your class inherits the appropriate base (ServiceBase) and implements the matching interface (ITextGenService, IChatAugmentationsService, ITextToSpeechService, …).

Examples in the wild

ChatAugmentations — voxta-module-elite-dangerous (open-source reference example)
ComputerVision — built-in Cloud / OpenAI vision modules
TextGen — built-in LLM providers (OpenAI, Anthropic, llama.cpp, Ollama, ...)

Not modules — these are external integrations that connect to Voxta over the WebSocket API rather than running in-process:

Voxta Minecraft Companion — Electron + Mineflayer (Node.js)
Voxta VAM plugin — in-game C# script for Virt-A-Mate
voxta-home-assistant-integration — Python add-on for Home Assistant

See Modules vs. integrations for when each fits.