LlamaSharp
In-process .NET binding for llama.cpp — local LLM inference without a separate Python install.
LlamaSharp is a .NET wrapper around llama.cpp. It runs inference in-process with Voxta — no external server, no Python, no Docker.
Supports multimodal vision when paired with an mmproj projector file — same as llama.cpp (see Computer Vision).
Setup
Install the service in Voxta
Manage Services → + Add Services → LlamaSharp → Add.
Point at a GGUF model
Provide a path to a GGUF file. Same model format as llama.cpp.
Save
The model loads when the service is first used.