KoboldAI / KoboldCpp
Local LLM runtime with built-in API server.
KoboldCpp is a single-executable local LLM runtime built on llama.cpp. You install KoboldCpp separately, point it at a GGUF model, and Voxta connects to its API endpoint.
Setup
Download KoboldCpp
Grab the latest release from github.com/LostRuins/koboldcpp. Single .exe, no installer.
Download a GGUF model
Find a GGUF-format model on Hugging Face that fits your VRAM.
Launch KoboldCpp pointing at the model
Run koboldcpp.exe, point it at your GGUF file, pick GPU offload layers, click Launch. KoboldCpp listens on http://localhost:5001/api by default.
Add to Voxta
Manage Services → + Add Services → KoboldAI → Add.
Enter the URL (default http://localhost:5001/api), click Save.