Voxta docs

LlamaSharp

In-process .NET binding for llama.cpp — local LLM inference without a separate Python install.

LlamaSharp is a .NET wrapper around llama.cpp. It runs inference in-process with Voxta — no external server, no Python, no Docker.

Supports multimodal vision when paired with an mmproj projector file — same as llama.cpp (see Computer Vision).

Setup

Install the service in Voxta

Manage Services → + Add Services → LlamaSharp → Add.

Point at a GGUF model

Provide a path to a GGUF file. Same model format as llama.cpp.

Save

The model loads when the service is first used.

On this page