KoboldAI / KoboldCpp

KoboldCpp is a single-executable local LLM runtime built on llama.cpp. You install KoboldCpp separately, point it at a GGUF model, and Voxta connects to its API endpoint.

Setup

Download KoboldCpp

Grab the latest release from github.com/LostRuins/koboldcpp. Single .exe, no installer.

Download a GGUF model

Find a GGUF-format model on Hugging Face that fits your VRAM.

Launch KoboldCpp pointing at the model

Run koboldcpp.exe, point it at your GGUF file, pick GPU offload layers, click Launch. KoboldCpp listens on http://localhost:5001/api by default.

Add to Voxta

Manage Services → + Add Services → KoboldAI → Add.

Enter the URL (default http://localhost:5001/api), click Save.