Voxta docs

Sesame Conversational Speech Model

Generates RVQ audio codes from text and audio inputs — Sesame's speech generation model.

Sesame CSM is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs.

Sesame CSM is currently marked experimental — the runtime and config surface may change.

Setup

Add the service

Manage Services → + Add Services → Sesame Conversational Speech Model → Add. Voxta installs the Python runtime and model weights automatically on first use.

Pick a voice

In the Sesame CSM config, browse available voices.

On this page