50% di sconto sui modelli Vidu Q3 e Q3 Pro · Solo su WaveSpeedAI | 20 maggio – 2 giugno

Speech 02 Turbo

minimax /

Minimax Speech-02 Turbo is a high-definition text-to-speech model delivering natural voice output. Cost: $0.03 per 1000 characters. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio
Input
Hint: Desired voice ID. Use a voice ID you have trained (https://wavespeed.ai/models/minimax/voice-clone), or one of the following system voice IDs: Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl
This parameter supports English text normalization, which improves performance in number-reading scenarios.
If set to true, the function will wait for the result to be generated and uploaded before returning the response. It allows you to get the result directly in the response. This property is only available through the API.

Inattivo

$0.03per esecuzione·~33 / $1

EsempiVedi tutto

Modelli correlati

README

MiniMax Speech-02-Turbo

Convert text to natural, expressive speech with MiniMax Speech-02-Turbo. This advanced text-to-speech model offers 17+ preset voices, custom voice cloning support, and emotional expression control — perfect for voiceovers, content creation, and audio production.

Why It Sounds Great

  • Natural speech: Human-like intonation, rhythm, and expression.
  • 17+ preset voices: Wide variety of characters from casual to professional.
  • Custom voice cloning: Use your own trained voice IDs for personalized output.
  • Emotion control: Add emotional expression like happy, sad, or neutral.
  • Voice tuning: Adjust speed, volume, and pitch for perfect delivery.
  • Audio quality options: Configure sample rate, bitrate, and format.

Parameters

ParameterRequiredDescription
textYesThe text you want to convert to speech.
voice_idYesVoice to use — preset ID or custom trained voice.
speedNoSpeech speed multiplier. Default: 1.
volumeNoVolume level. Default: 1.
pitchNoPitch adjustment. Default: 0.
emotionNoEmotional tone: happy, sad, angry, neutral, etc.
english_normalizationNoImproves number-reading in English text.
sample_rateNoAudio sample rate (e.g., 22050, 44100).
bitrateNoAudio bitrate quality.
channelNoAudio channels (mono/stereo).
formatNoOutput format (mp3, wav, etc.).
language_boostNoBoost specific language pronunciation.

Available Preset Voices

Voice IDCharacter
Wise_WomanMature, thoughtful female
Friendly_PersonWarm, approachable
Inspirational_girlMotivating young female
Deep_Voice_ManRich, deep male voice
Calm_WomanSoothing, relaxed female
Casual_GuyLaid-back male
Lively_GirlEnergetic young female
Patient_ManSteady, reassuring male
Young_KnightYouthful, heroic male
Determined_ManStrong, resolute male
Lovely_GirlSweet, pleasant female
Decent_BoyPolite young male
Imposing_MannerAuthoritative presence
Elegant_ManRefined, sophisticated male
AbbessWise, spiritual female
Sweet_Girl_2Gentle, charming female
Exuberant_GirlExcited, enthusiastic female
Energetic_GirlVibrant, dynamic female

How to Use

  1. Enter your text — type or paste the content to convert.
  2. Select voice — choose a preset voice or enter your custom voice ID.
  3. Adjust settings (optional) — tune speed, volume, pitch, and emotion.
  4. Configure audio (optional) — set sample rate, bitrate, and format.
  5. Run — click the button to generate.
  6. Download — preview and save your audio file.

Pricing

Per-character billing based on text length.

Text LengthCost
1,000 characters$0.03
5,000 characters$0.15
10,000 characters$0.30

Best Use Cases

  • Voiceovers — Create professional narration for videos and presentations.
  • Audiobooks — Generate natural-sounding book narration.
  • Content Creation — Add voice to social media videos and podcasts.
  • E-learning — Produce educational audio content at scale.
  • Accessibility — Convert written content to audio format.
  • Character Voices — Create distinct voices for games and animations.

Custom Voice Cloning

Train your own voice for personalized output:

Voice Clone Training

Pro Tips for Best Results

  • Match voice character to content tone — use Calm_Woman for meditation, Energetic_Girl for ads.
  • Use emotion parameter to add expressiveness: "happy" for upbeat, "neutral" for professional.
  • Adjust speed slightly (0.9-1.1) for more natural pacing.
  • Enable english_normalization when text contains numbers or abbreviations.
  • Test different voices with the same text to find the perfect match.
  • For long content, break into paragraphs for more natural pacing.

Notes

  • Pricing is based on character count, not audio duration.
  • Custom voice IDs require prior voice clone training.
  • Processing time scales with text length.
  • Multiple output formats available for different use cases.
Accessibilità:Questo sito web utilizza modelli di intelligenza artificiale forniti da terze parti.

Speech 02 Turbo API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/minimax/speech-02-turbo with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Speech 02 Turbo below.

HTTP example
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/minimax/speech-02-turbo" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "text": "Hello world! This is a test of the text-to-speech system.",
    "speed": 1,
    "volume": 1,
    "pitch": 0,
    "emotion": "happy",
    "english_normalization": false,
    "sample_rate": 8000,
    "bitrate": 32000,
    "channel": "1",
    "format": "mp3",
    "language_boost": "Chinese",
    "enable_sync_mode": false
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].
Node.js example
// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("minimax/speech-02-turbo", {
        "text": "Hello world! This is a test of the text-to-speech system.",
        "speed": 1,
        "volume": 1,
        "pitch": 0,
        "emotion": "happy",
        "english_normalization": false,
        "sample_rate": 8000,
        "bitrate": 32000,
        "channel": "1",
        "format": "mp3",
        "language_boost": "Chinese",
        "enable_sync_mode": false
});

console.log(result.outputs[0]); // → URL of the generated output
Python example
# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "minimax/speech-02-turbo",
    {
    "text": "Hello world! This is a test of the text-to-speech system.",
    "speed": 1,
    "volume": 1,
    "pitch": 0,
    "emotion": "happy",
    "english_normalization": false,
    "sample_rate": 8000,
    "bitrate": 32000,
    "channel": "1",
    "format": "mp3",
    "language_boost": "Chinese",
    "enable_sync_mode": false
}
)

print(output["outputs"][0])  # → URL of the generated output

Speech 02 Turbo API — Frequently asked questions

What is the Speech 02 Turbo API?

Speech 02 Turbo is a MiniMax model for audio generation, exposed as a REST API on WaveSpeedAI. Minimax Speech-02 Turbo is a high-definition text-to-speech model delivering natural voice output. Cost: $0.03 per 1000 characters. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Speech 02 Turbo API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/minimax/minimax-speech-02-turbo.

How much does Speech 02 Turbo cost per run?

Speech 02 Turbo starts at $0.030 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Speech 02 Turbo accept?

Key inputs: `bitrate`, `channel`, `emotion`, `enable_sync_mode`, `english_normalization`, `format`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/minimax/minimax-speech-02-turbo.

How long does Speech 02 Turbo take to generate?

Average end-to-end generation time on WaveSpeedAI is around 7 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.

Can I use Speech 02 Turbo outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (MiniMax). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.