How to Generate Audio
Create realistic voice, music, and sound effects using AI audio models.
Not sure which model to use? Try our Audio Generator — we’ve curated the best audio models so you can start creating right away.
Overview
AI audio models can generate speech, music, sound effects, and more. Text-to-speech models convert written text into natural-sounding voice audio, while other models create music and sound effects from text descriptions. Some models also support voice cloning.
Quick Start
Web Interface
- Go to wavespeed.ai/models
- Select an audio model (e.g., Minimax Speech, ElevenLabs, Minimax Music)
- Enter your text
- Select a voice
- Click Run
API
curl --location --request POST 'https://api.wavespeed.ai/api/v3/minimax/speech-2.6-hd' \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"text": "Hello, welcome to WaveSpeedAI. This is a demonstration of text to speech.",
"voice_id": "Friendly_Person",
"emotion": "happy",
"speed": 1,
"pitch": 0,
"volume": 1
}'Recommended Models
| Model | Best For | Voice Cloning |
|---|---|---|
| Minimax Speech 2.6 HD | Natural voices, emotional range | Yes |
| ElevenLabs | High quality, multiple languages | Yes |
| Dia TTS | Fast, good quality | No |
Common Parameters
| Parameter | Description | Example |
|---|---|---|
text | Text to speak | ”Hello world” |
voice_id | Voice selection | ”Friendly_Person” |
emotion | Voice emotion | ”happy”, “sad”, “angry” |
speed | Speaking rate | 0.5 - 2.0 |
pitch | Voice pitch | -12 to 12 |
volume | Output volume | 0.1 - 10 |
Voice Cloning
Clone a voice from an audio sample:
- Upload your voice sample to get a URL
- Use the URL in your request:
curl --location --request POST 'https://api.wavespeed.ai/api/v3/minimax/voice-clone' \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"text": "This is my cloned voice speaking.",
"audio": "https://your-uploaded-audio-url",
"custom_voice_id": "my-voice-001",
"model": "speech-02-hd",
"accuracy": 0.7
}'Voice Sample Requirements
- Clear audio, minimal background noise
- 10-30 seconds of natural speech
- Single speaker only
- Supported formats: MP3, WAV, M4A
Available Voices
Voice options vary by model. Check the model documentation for:
- Available voice IDs
- Language support
- Gender options
- Accent variations
See:
Tips for Better Results
- Use punctuation — Helps with natural pacing
- Break long text — Split into paragraphs for better results
- Test voices — Different voices suit different content
- Adjust speed — Slower for clarity, faster for excitement