How to Generate Audio

Create realistic voice, music, and sound effects using AI audio models.

Not sure which model to use? Try our Audio Generator — we’ve curated the best audio models so you can start creating right away.

Overview

AI audio models can generate speech, music, sound effects, and more. Text-to-speech models convert written text into natural-sounding voice audio, while other models create music and sound effects from text descriptions. Some models also support voice cloning.

Quick Start

Web Interface

Go to wavespeed.ai/models
Select an audio model (e.g., Minimax Speech, ElevenLabs, Minimax Music)
Enter your text
Select a voice
Click Run

API

curl --fail-with-body --connect-timeout 10 --max-time 60 --request POST 'https://api.wavespeed.ai/api/v3/minimax/speech-2.6-hd' \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
  "text": "Hello, welcome to WaveSpeedAI. This is a demonstration of text to speech.",
  "voice_id": "Friendly_Person",
  "emotion": "happy",
  "speed": 1,
  "pitch": 0,
  "volume": 1
}'

Recommended Models

Model	Best For	Voice Cloning
Minimax Speech 2.6 HD	Natural voices, emotional range	Yes
ElevenLabs	High quality, multiple languages	Yes
Dia TTS	Fast, good quality	No

Common Parameters

Parameter	Description	Example
`text`	Text to speak	”Hello world”
`voice_id`	Voice selection	”Friendly_Person”
`emotion`	Voice emotion	”happy”, “sad”, “angry”
`speed`	Speaking rate	0.5 - 2.0
`pitch`	Voice pitch	-12 to 12
`volume`	Output volume	0.1 - 10

Voice Cloning

Clone a voice from an audio sample:

Upload your voice sample to get a URL
Use the URL in your request:

curl --fail-with-body --connect-timeout 10 --max-time 60 --request POST 'https://api.wavespeed.ai/api/v3/minimax/voice-clone' \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
  "text": "This is my cloned voice speaking.",
  "audio": "https://your-uploaded-audio-url",
  "custom_voice_id": "my-voice-001",
  "model": "speech-02-hd",
  "accuracy": 0.7
}'

Voice Sample Requirements

Clear audio, minimal background noise
10-30 seconds of natural speech
Single speaker only
Supported formats: MP3, WAV, M4A

Available Voices

Voice options vary by model. Check the model documentation for:

Available voice IDs
Language support
Gender options
Accent variations

See:

Tips for Better Results

Use punctuation — Helps with natural pacing
Break long text — Split into paragraphs for better results
Test voices — Different voices suit different content
Adjust speed — Slower for clarity, faster for excitement

How to Create Video from Image How to Create a Digital Human