Voxta docs

Voxta Server Voxta Cloud Integrations

Voxta Server

Using Voxta

Language

Large Language Models

Cloud-Based

OpenAI Anthropic (Claude)OpenRouter Google (Gemini)xAI (Grok)NovelAI Mistral AI

Self-Hosted: Zero-Setup

ExLlamaV2 ExLlamaV3 llama.cpp LlamaSharp

Self-Hosted: External

KoboldAI / KoboldCpp Text Generation Web UI OpenAI-compatible

Voice

Vision & Generation

Knowledge

Augmentations

Motion & Hardware

Other

Voxta Cloud (as a service)Discord

Reference

Voxta Server Services Large Language Models

LlamaSharp

In-process .NET binding for llama.cpp — local LLM inference without a separate Python install.

LlamaSharp is a .NET wrapper around llama.cpp. It runs inference in-process with Voxta — no external server, no Python, no Docker.

Supports multimodal vision when paired with an mmproj projector file — same as llama.cpp (see Computer Vision).

Setup

Install the service in Voxta

Manage Services → + Add Services → LlamaSharp → Add.

Point at a GGUF model

Provide a path to a GGUF file. Same model format as llama.cpp.

Save

The model loads when the service is first used.

llama.cpp

Voxta's primary local LLM integration — in-process GGUF inference with GPU offload.

KoboldAI / KoboldCpp

Local LLM runtime with built-in API server.

LlamaSharp

Setup

Install the service in Voxta

Point at a GGUF model

Save

On this page