Speech 02 Turbo | Realistic Voice & TTS API

MiniMax Speech-02-Turbo

Convert text to natural, expressive speech with MiniMax Speech-02-Turbo. This advanced text-to-speech model offers 17+ preset voices, custom voice cloning support, and emotional expression control — perfect for voiceovers, content creation, and audio production.

Why It Sounds Great

Natural speech: Human-like intonation, rhythm, and expression.
17+ preset voices: Wide variety of characters from casual to professional.
Custom voice cloning: Use your own trained voice IDs for personalized output.
Emotion control: Add emotional expression like happy, sad, or neutral.
Voice tuning: Adjust speed, volume, and pitch for perfect delivery.
Audio quality options: Configure sample rate, bitrate, and format.

Parameters

Parameter	Required	Description
text	Yes	The text you want to convert to speech.
voice_id	Yes	Voice to use — preset ID or custom trained voice.
speed	No	Speech speed multiplier. Default: 1.
volume	No	Volume level. Default: 1.
pitch	No	Pitch adjustment. Default: 0.
emotion	No	Emotional tone: happy, sad, angry, neutral, etc.
english_normalization	No	Improves number-reading in English text.
sample_rate	No	Audio sample rate (e.g., 22050, 44100).
bitrate	No	Audio bitrate quality.
channel	No	Audio channels (mono/stereo).
format	No	Output format (mp3, wav, etc.).
language_boost	No	Boost specific language pronunciation.

Available Preset Voices

Voice ID	Character
Wise_Woman	Mature, thoughtful female
Friendly_Person	Warm, approachable
Inspirational_girl	Motivating young female
Deep_Voice_Man	Rich, deep male voice
Calm_Woman	Soothing, relaxed female
Casual_Guy	Laid-back male
Lively_Girl	Energetic young female
Patient_Man	Steady, reassuring male
Young_Knight	Youthful, heroic male
Determined_Man	Strong, resolute male
Lovely_Girl	Sweet, pleasant female
Decent_Boy	Polite young male
Imposing_Manner	Authoritative presence
Elegant_Man	Refined, sophisticated male
Abbess	Wise, spiritual female
Sweet_Girl_2	Gentle, charming female
Exuberant_Girl	Excited, enthusiastic female
Energetic_Girl	Vibrant, dynamic female

How to Use

Enter your text — type or paste the content to convert.
Select voice — choose a preset voice or enter your custom voice ID.
Adjust settings (optional) — tune speed, volume, pitch, and emotion.
Configure audio (optional) — set sample rate, bitrate, and format.
Run — click the button to generate.
Download — preview and save your audio file.

Pricing

Per-character billing based on text length.

Text Length	Cost
1,000 characters	$0.03
5,000 characters	$0.15
10,000 characters	$0.30

Best Use Cases

Voiceovers — Create professional narration for videos and presentations.
Audiobooks — Generate natural-sounding book narration.
Content Creation — Add voice to social media videos and podcasts.
E-learning — Produce educational audio content at scale.
Accessibility — Convert written content to audio format.
Character Voices — Create distinct voices for games and animations.

Custom Voice Cloning

Train your own voice for personalized output:

Voice Clone Training

Pro Tips for Best Results

Match voice character to content tone — use Calm_Woman for meditation, Energetic_Girl for ads.
Use emotion parameter to add expressiveness: "happy" for upbeat, "neutral" for professional.
Adjust speed slightly (0.9-1.1) for more natural pacing.
Enable english_normalization when text contains numbers or abbreviations.
Test different voices with the same text to find the perfect match.
For long content, break into paragraphs for more natural pacing.

Notes

Pricing is based on character count, not audio duration.
Custom voice IDs require prior voice clone training.
Processing time scales with text length.
Multiple output formats available for different use cases.

Speech 02 Turbo API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/minimax/speech-02-turbo with your input as JSON. The endpoint returns a prediction id. Start polling the result endpoint around every 2 seconds, increase the interval for long-running tasks, and stop on any terminal status. On completed, read output values from data.outputs. Examples for Speech 02 Turbo below.

HTTP example

set -euo pipefail

: "${WAVESPEED_API_KEY:?Set WAVESPEED_API_KEY}"

REQUEST_BODY=$(cat <<'JSON'
{
    "text": "A clear example input",
    "voice_id": "Wise_Woman",
    "speed": 1,
    "volume": 1,
    "pitch": 0,
    "emotion": "happy",
    "english_normalization": false,
    "sample_rate": 8000,
    "bitrate": 32000,
    "channel": "1",
    "format": "mp3",
    "language_boost": "Chinese"
}
JSON
)

# 1. Submit the prediction.
SUBMIT_RESPONSE=$(curl --silent --show-error --fail-with-body \
  -X POST "https://api.wavespeed.ai/api/v3/minimax/speech-02-turbo" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d "$REQUEST_BODY")

TASK=$(printf '%s' "$SUBMIT_RESPONSE" | jq 'if has("data") then .data else . end')
PREDICTION_ID=$(printf '%s' "$TASK" | jq -r '.id')
if [ -z "$PREDICTION_ID" ] || [ "$PREDICTION_ID" = "null" ]; then
  printf 'Submission response did not contain a prediction id
' >&2
  exit 1
fi
RESULT_URL=$(printf '%s' "$TASK" | jq -r '.urls.get // empty')
if [ -z "$RESULT_URL" ]; then
  RESULT_URL="https://api.wavespeed.ai/api/v3/predictions/$PREDICTION_ID/result"
fi

# 2. Poll until the prediction finishes.
while true; do
  RESPONSE=$(curl --silent --show-error --fail-with-body "$RESULT_URL" \
    -H "Authorization: Bearer $WAVESPEED_API_KEY")
  RESULT=$(printf '%s' "$RESPONSE" | jq 'if has("data") then .data else . end')
  STATUS=$(printf '%s' "$RESULT" | jq -r '.status')
  case "$STATUS" in
    completed) printf '%s\n' "$RESULT" | jq '.outputs'; break ;;
    failed|cancelled|timeout) printf '%s\n' "$RESULT" | jq . >&2; exit 1 ;;
    created|processing) sleep 2 ;;
    *) printf 'Unexpected status: %s
' "$STATUS" >&2; exit 1 ;;
  esac
done

Node.js example

const submitUrl = "https://api.wavespeed.ai/api/v3/minimax/speech-02-turbo";
const apiKey = process.env.WAVESPEED_API_KEY;
if (!apiKey) throw new Error('Set WAVESPEED_API_KEY');

async function requestJson(url, options = {}) {
  const response = await fetch(url, options);
  if (!response.ok) throw new Error(await response.text());
  return response.json();
}

// 1. Submit the prediction.
const body = await requestJson(submitUrl, {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${apiKey}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
        "text": "A clear example input",
        "voice_id": "Wise_Woman",
        "speed": 1,
        "volume": 1,
        "pitch": 0,
        "emotion": "happy",
        "english_normalization": false,
        "sample_rate": 8000,
        "bitrate": 32000,
        "channel": "1",
        "format": "mp3",
        "language_boost": "Chinese"
}),
});
const task = body.data ?? body;
if (!task.id) throw new Error("Submission response did not contain a prediction id");
const resultUrl = task.urls?.get ||
  `https://api.wavespeed.ai/api/v3/predictions/${task.id}/result`;

// 2. Poll until the prediction finishes.
while (true) {
  const resultBody = await requestJson(resultUrl, {
    headers: { "Authorization": `Bearer ${apiKey}` },
  });
  const result = resultBody.data ?? resultBody;
  if (result.status === "completed") {
    console.log(result.outputs);
    break;
  }
  if (["failed", "cancelled", "timeout"].includes(result.status)) throw new Error(JSON.stringify(result));
  if (!["created", "processing"].includes(result.status)) throw new Error("Unexpected status: " + result.status);
  await new Promise(resolve => setTimeout(resolve, 2000));
}

Python example

import json
import os
import time
from urllib.request import Request, urlopen

api_key = os.environ["WAVESPEED_API_KEY"]
headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}
payload = {
    "text": "A clear example input",
    "voice_id": "Wise_Woman",
    "speed": 1,
    "volume": 1,
    "pitch": 0,
    "emotion": "happy",
    "english_normalization": False,
    "sample_rate": 8000,
    "bitrate": 32000,
    "channel": "1",
    "format": "mp3",
    "language_boost": "Chinese"
}

def request_json(url, data=None):
    request = Request(url, data=data, headers=headers, method="POST" if data else "GET")
    with urlopen(request) as response:
        return json.load(response)

# 1. Submit the prediction.
body = request_json("https://api.wavespeed.ai/api/v3/minimax/speech-02-turbo", json.dumps(payload).encode())
task = body.get("data", body)
if not task.get("id"):
    raise RuntimeError("Submission response did not contain a prediction id")
result_url = task.get("urls", {}).get("get") or f"https://api.wavespeed.ai/api/v3/predictions/{task['id']}/result"

# 2. Poll until the prediction finishes.
while True:
    result_body = request_json(result_url)
    result = result_body.get("data", result_body)
    status = result.get("status")
    if status == "completed":
        print(result.get("outputs", []))
        break
    if status in {"failed", "cancelled", "timeout"}:
        raise RuntimeError(result)
    if status not in {"created", "processing"}:
        raise RuntimeError(f"Unexpected status: {status}")
    time.sleep(2)

Speech 02 Turbo API — Frequently asked questions

What is the Speech 02 Turbo API?

Speech 02 Turbo is a MiniMax model for audio generation, exposed as a REST API on WaveSpeedAI. Minimax Speech-02 Turbo is a high-definition text-to-speech model delivering natural voice output. Cost: $0.03 per 1000 characters. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Speech 02 Turbo API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID. Poll the result endpoint starting around every 2 seconds, increase the interval for long-running tasks, and stop on any terminal status. The playground generates production-oriented Python, JavaScript, and cURL examples with timeouts, transient-error handling, and safe GET retries. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/minimax/minimax-speech-02-turbo.

How much does Speech 02 Turbo cost per run?

Speech 02 Turbo starts at $0.030 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Speech 02 Turbo accept?

Key inputs: `bitrate`, `channel`, `emotion`, `enable_sync_mode`, `english_normalization`, `format`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/minimax/minimax-speech-02-turbo.

How long does Speech 02 Turbo take to generate?

Median end-to-end generation time on WaveSpeedAI is around 5 seconds per request, based on recent successful runs. Queue time varies with global demand; live status is visible in the prediction record.

Can I use Speech 02 Turbo outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (MiniMax). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.

EsempiVedi tutto

Modelli correlati

README