Voxta docs
DevelopersModules

Service types

The full catalog of service types your module can implement.

When you register a module, you declare what kind(s) of service it provides via the Supports dictionary. This page lists every supported value and what to use it for.

The Supports dictionary

builder.Register(new ServiceDefinition
{
    // ...
    Supports = new Dictionary<ServiceTypes, ServiceDefinitionCategoryScore>
    {
        { ServiceTypes.ChatAugmentations, ServiceDefinitionCategoryScore.High }
    }
});

Each entry says: "I can act as this kind of service, with this confidence level." A single module can support multiple service types — for example, a vision provider that does both image generation and image understanding declares both.

The score

ServiceDefinitionCategoryScore is a relative weight Voxta uses to rank candidate services for a slot. Values: Low, Medium, High. Use High if your module is purpose-built for this service type, Low if it's a side capability.

Catalog

ServiceTypes valueWhat it doesTypical use case
TextGenGenerates assistant replies from prompts.OpenAI, Anthropic, llama.cpp, KoboldCpp, Ollama.
ActionInferenceDecides which semantic action the user is triggering (run alongside TextGen).Usually delegated to a fast LLM. Configure separately from main chat LLM for cost / speed.
SummarizationCondenses long chat histories into memory.A cheap LLM endpoint kept separate from the main chat LLM.
TextToSpeechRenders text to audio.ElevenLabs, Azure TTS, Coqui, Piper, SAPI.
SpeechToTextTranscribes microphone audio.Vosk, Whisper, Azure Speech, Google STT.
AudioInputRaw microphone capture.Custom audio backends, virtual mics, network audio.
AudioOutputSpeaker playback.Custom output routing, network audio.
AudioPipelineAudio processing (noise suppression, gain, format conversion).DSP plugins, sample-rate converters.
WakeWordListens for a hotword to start a chat turn.Picovoice Porcupine, openWakeWord.
VisionCaptureCaptures images from a source (screen, webcam, game window).Screen grabbers, OBS bridges.
ComputerVisionUnderstands images — describes scenes, identifies objects, reads text.GPT-4o vision, Claude vision, local vision models.
ChatAugmentationsInjects context into the chat and exposes semantic actions the LLM can trigger.In-process game and app integrations — anything you can drive from .NET without needing another runtime. Elite Dangerous COVAS is the reference example.
MemoryLong-term knowledge store the chat can query.Vector databases, RAG backends.
ImageGenGenerates images from text.Stable Diffusion (local), DALL·E, Imagen.
AnimationsDrives avatar / character motion from text.Future — currently used for the HY-Motion experimental motion service.

Choosing the right type

A few rules of thumb:

  • Wrapping an external AI API? Pick the corresponding category — TextGen for an LLM, TextToSpeech for a voice, etc. The framework gives you a ServiceBase to inherit from and an interface to implement.
  • Adding context to chats? ChatAugmentations. This is the most flexible category — you get callbacks for events, can inject text into the prompt, and can register semantic actions the LLM calls.
  • Capturing the screen or webcam? VisionCapture. Pair it with a ComputerVision service to describe what was captured.
  • Bridging a game or app? ChatAugmentations is usually the right home, since the integration's job is to feed game state into chat and translate chat output into game actions.

Registering the implementation

After declaring Supports, you tell the builder which class implements each type:

builder.AddTextGenService<MyTextGen>("my-service");
builder.AddChatAugmentationsService<MyAugmentations>("my-service");
builder.AddTextToSpeechService<MyTts>("my-service");
// etc.

The string is your ServiceName from the ServiceDefinition. The builder has a registration method for every service type — Add{ServiceType}Service<TImpl>(string serviceName). Your class inherits the appropriate base (ServiceBase) and implements the matching interface (ITextGenService, IChatAugmentationsService, ITextToSpeechService, …).

Examples in the wild

  • ChatAugmentationsvoxta-module-elite-dangerous (open-source reference example)
  • ComputerVision — built-in Cloud / OpenAI vision modules
  • TextGen — built-in LLM providers (OpenAI, Anthropic, llama.cpp, Ollama, ...)

Not modules — these are external integrations that connect to Voxta over the WebSocket API rather than running in-process:

See Modules vs. integrations for when each fits.

On this page