Voxta docs

Renting GPUs with RunPod

Host large LLMs on RunPod's cloud GPUs when local hardware isn't enough.

RunPod is a pay-as-you-go cloud GPU platform. If you want to run a model that doesn't fit on your local GPU — or you don't have a GPU at all — RunPod lets you rent one by the second, install your favorite LLM backend on it, and have Voxta connect to it as a remote service.

How it works

  1. Rent a GPU instance on RunPod (pick a template that matches your LLM backend — Text Generation Web UI, vLLM, etc.).
  2. Spin up the LLM backend on that instance with the model you want.
  3. Expose the LLM's API endpoint via RunPod's networking.
  4. Configure the matching service in Voxta (Text Generation Web UI, OpenAI-compatible, etc.) to point at the RunPod endpoint.

You're paying per minute of GPU time — typically a few cents an hour for smaller GPUs, more for high-VRAM ones. Stop the pod when you're not using it.

Video walkthrough

When to use RunPod

  • The model you want needs more VRAM than you have locally.
  • You want a dedicated GPU per chat session without buying hardware.
  • You're testing whether a bigger model would meaningfully change quality before committing to a hardware upgrade.

Alternative — Voxta Cloud

If you don't want to manage a GPU instance at all, Voxta Cloud gives you LLM, TTS, and STT through Voxta's hosted backend with monthly Patreon credits. No infrastructure work.

On this page