Voxta docs

Getting Started

Sign up, generate an API key on the portal, and connect Voxta Cloud to your Voxta Server.

Voxta Cloud connects to your Voxta Server through the same Services system as everything else. Drop in an API key, save, and it's available to your characters.

Prerequisites

  • A Patreon subscription to one of the Voxta tiers that includes Voxta Cloud credits. (Most tiers do.)
  • A Discord account linked to your Patreon.
  • Voxta Server installed and running.

Connect Voxta Server to Voxta Cloud

Sign in to the portal

Go to portal.voxta.ai and sign in. The portal links your Patreon and Discord accounts to your Voxta Cloud entitlements.

Generate an API key

In the portal, generate your API key. Copy it immediately and store it somewhere safe — you can't view it again after generation, only rotate it.

Only one API key is active at a time. Generating a new one revokes the previous one. This is intentional — if your key leaks, rotate it and the old key stops working everywhere.

In Voxta, install the Voxta Cloud service

Two paths, same result:

  • Wizard — if you're on first-time setup, the wizard offers Voxta Cloud as an option.
  • Manage Services — go to Services → Add Services, find Voxta Cloud, click to install.

Paste your API key

In the Voxta Cloud service config, paste your API key into the top field, then click Save & Install Service at the bottom.

That's it — Voxta Cloud is now available as a backend for LLM, TTS, and STT.

(Optional) Customize models

The default models work for most users. To change them (different LLM, different voice provider), clone the installed Voxta Cloud service in Manage Services and edit the clone's settings. Click Show Advanced Settings to expose the model picker.

What you get

  • LLM — Voxta tunes the default model continuously. Currently routed via OpenRouter to a high-quality general-purpose model. You can override per-clone.

  • TTS — three options out of the box:

    • UnrealSpeech — cheapest credits per minute, good quality.
    • Cartesia — low-latency neural TTS, great middle ground.
    • ElevenLabs — pricier, best quality and emotion.

    Pick voice per character on the Character Card.

  • STT — Deepgram-backed, low latency.

Want a free TTS option instead? The Coqui XTTS local service runs on ~2 GB of VRAM and is built into Voxta Server.

What's next

On this page