Renting GPUs with RunPod
Host large LLMs on RunPod's cloud GPUs when local hardware isn't enough.
RunPod is a pay-as-you-go cloud GPU platform. If you want to run a model that doesn't fit on your local GPU — or you don't have a GPU at all — RunPod lets you rent one by the second, install your favorite LLM backend on it, and have Voxta connect to it as a remote service.
How it works
- Rent a GPU instance on RunPod (pick a template that matches your LLM backend — Text Generation Web UI, vLLM, etc.).
- Spin up the LLM backend on that instance with the model you want.
- Expose the LLM's API endpoint via RunPod's networking.
- Configure the matching service in Voxta (Text Generation Web UI, OpenAI-compatible, etc.) to point at the RunPod endpoint.
You're paying per minute of GPU time — typically a few cents an hour for smaller GPUs, more for high-VRAM ones. Stop the pod when you're not using it.
Video walkthrough
When to use RunPod
- The model you want needs more VRAM than you have locally.
- You want a dedicated GPU per chat session without buying hardware.
- You're testing whether a bigger model would meaningfully change quality before committing to a hardware upgrade.
Alternative — Voxta Cloud
If you don't want to manage a GPU instance at all, Voxta Cloud gives you LLM, TTS, and STT through Voxta's hosted backend with monthly Patreon credits. No infrastructure work.