How to Generate Speech

How to Generate Speech

Create realistic voice audio from text using text-to-speech (TTS) models.

Overview

Text-to-speech models convert written text into natural-sounding audio. Some models also support voice cloning.

Quick Start

Web Interface

  1. Go to wavespeed.ai/models
  2. Select a TTS model (e.g., Minimax Speech, ElevenLabs)
  3. Enter your text
  4. Select a voice
  5. Click Run

API

curl --location --request POST 'https://api.wavespeed.ai/api/v3/minimax/speech-2.6-hd' \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
  "text": "Hello, welcome to WaveSpeedAI. This is a demonstration of text to speech.",
  "voice_id": "Friendly_Person",
  "emotion": "happy",
  "speed": 1,
  "pitch": 0,
  "volume": 1
}'
ModelBest ForVoice Cloning
Minimax Speech 2.6 HDNatural voices, emotional rangeYes
ElevenLabsHigh quality, multiple languagesYes
Dia TTSFast, good qualityNo

Common Parameters

ParameterDescriptionExample
textText to speak”Hello world”
voice_idVoice selection”Friendly_Person”
emotionVoice emotion”happy”, “sad”, “angry”
speedSpeaking rate0.5 - 2.0
pitchVoice pitch-12 to 12
volumeOutput volume0.1 - 10

Voice Cloning

Clone a voice from an audio sample:

  1. Upload your voice sample to get a URL
  2. Use the URL in your request:
curl --location --request POST 'https://api.wavespeed.ai/api/v3/minimax/voice-clone' \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
  "text": "This is my cloned voice speaking.",
  "audio": "https://your-uploaded-audio-url",
  "custom_voice_id": "my-voice-001",
  "model": "speech-02-hd",
  "accuracy": 0.7
}'

Voice Sample Requirements

  • Clear audio, minimal background noise
  • 10-30 seconds of natural speech
  • Single speaker only
  • Supported formats: MP3, WAV, M4A

Available Voices

Voice options vary by model. Check the model documentation for:

  • Available voice IDs
  • Language support
  • Gender options
  • Accent variations

See:

Tips for Better Results

  1. Use punctuation — Helps with natural pacing
  2. Break long text — Split into paragraphs for better results
  3. Test voices — Different voices suit different content
  4. Adjust speed — Slower for clarity, faster for excitement
© 2025 WaveSpeedAI. All rights reserved.