Text To Speech HTTP API

About Text-To-Speech HTTP API

This allows you to call any text-to-speech service, given the right configuration.

  • Content Type: The mime type of the generated audio, such as audio/wav or audio/mpeg.
  • Url Template: The url to call, with {text} as a placeholder for the text to generate.
  • Request Body: The application/json body to generate speech, with {text} as a placeholder for the text to generate, and {{ culture }} or {{ language }} if needed. Other fields from the voice will be available too. The template uses Scriban if you need conditions.

Voices

You have two ways to list voices. Dynamically, if there is an API, or manually using Voices.

  • Voices Url: The url that should return a json array of voices.
  • Voices Format: How to convert a voice from the API to Voxta’s VoiceInfo format. You can use Scriban if you need conditions.
  • Default voices: Specify a part of the label or properties, to select from the list

xtts-api-server

This allows you to run xtts-v2, one of the best open source text to speech systems right now.

More information on how to install and run it.

Content Type

audio/wav

Url Template

http://localhost:8020/tts_to_audio/

Request Body:

{
  "text": "{{ text }}",
  "speaker_wav": "{{ speaker_wav }}",
  "language": "{{ if !language.empty? }}{{ language }}{{ else }}en{{ end }}"
}

Voices Url:

http://localhost:8020/speakers/

Voices Format:

{
  "label": "{{ name }}",
  "parameters": {
    "speaker_wav": "{{ voice_id }}.wav"
  }
}