Sesame Conversational Speech Model
Generates RVQ audio codes from text and audio inputs — Sesame's speech generation model.
Sesame CSM is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs.
Sesame CSM is currently marked experimental — the runtime and config surface may change.
Setup
Add the service
Manage Services → + Add Services → Sesame Conversational Speech Model → Add. Voxta installs the Python runtime and model weights automatically on first use.
Pick a voice
In the Sesame CSM config, browse available voices.