How to Generate Speech
Create realistic voice audio from text using text-to-speech (TTS) models.
Overview
Text-to-speech models convert written text into natural-sounding audio. Some models also support voice cloning.
Quick Start
Web Interface
- Go to wavespeed.ai/models
- Select a TTS model (e.g., Minimax Speech, ElevenLabs)
- Enter your text
- Select a voice
- Click Run
API
curl --location --request POST 'https://api.wavespeed.ai/api/v3/minimax/speech-2.6-hd' \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"text": "Hello, welcome to WaveSpeedAI. This is a demonstration of text to speech.",
"voice_id": "Friendly_Person",
"emotion": "happy",
"speed": 1,
"pitch": 0,
"volume": 1
}'Recommended Models
| Model | Best For | Voice Cloning |
|---|---|---|
| Minimax Speech 2.6 HD | Natural voices, emotional range | Yes |
| ElevenLabs | High quality, multiple languages | Yes |
| Dia TTS | Fast, good quality | No |
Common Parameters
| Parameter | Description | Example |
|---|---|---|
text | Text to speak | ”Hello world” |
voice_id | Voice selection | ”Friendly_Person” |
emotion | Voice emotion | ”happy”, “sad”, “angry” |
speed | Speaking rate | 0.5 - 2.0 |
pitch | Voice pitch | -12 to 12 |
volume | Output volume | 0.1 - 10 |
Voice Cloning
Clone a voice from an audio sample:
- Upload your voice sample to get a URL
- Use the URL in your request:
curl --location --request POST 'https://api.wavespeed.ai/api/v3/minimax/voice-clone' \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"text": "This is my cloned voice speaking.",
"audio": "https://your-uploaded-audio-url",
"custom_voice_id": "my-voice-001",
"model": "speech-02-hd",
"accuracy": 0.7
}'Voice Sample Requirements
- Clear audio, minimal background noise
- 10-30 seconds of natural speech
- Single speaker only
- Supported formats: MP3, WAV, M4A
Available Voices
Voice options vary by model. Check the model documentation for:
- Available voice IDs
- Language support
- Gender options
- Accent variations
See:
Tips for Better Results
- Use punctuation — Helps with natural pacing
- Break long text — Split into paragraphs for better results
- Test voices — Different voices suit different content
- Adjust speed — Slower for clarity, faster for excitement