WaveSpeed.ai
Accueil/Explorer/Minmax Hailuo Models/minimax/speech-02-turbo
text-to-audio

text-to-audio

Minimax Speech-02 Turbo

minimax/speech-02-turbo

Minimax Speech-02 Turbo is a high-definition text-to-speech model delivering natural voice output. Cost: $0.03 per 1000 characters. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Input
Hint: Desired voice ID. Use a voice ID you have trained (https://wavespeed.ai/models/minimax/voice-clone), or one of the following system voice IDs: Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl
This parameter supports English text normalization, which improves performance in number-reading scenarios.
If set to true, the function will wait for the result to be generated and uploaded before returning the response. It allows you to get the result directly in the response. This property is only available through the API.

Idle

Votre requête coûtera $0.03 par exécution.

Pour $1 vous pouvez exécuter ce modèle environ 33 fois.

ExemplesTout voir

README

MiniMax Speech-02-Turbo

Convert text to natural, expressive speech with MiniMax Speech-02-Turbo. This advanced text-to-speech model offers 17+ preset voices, custom voice cloning support, and emotional expression control — perfect for voiceovers, content creation, and audio production.

Why It Sounds Great

  • Natural speech: Human-like intonation, rhythm, and expression.
  • 17+ preset voices: Wide variety of characters from casual to professional.
  • Custom voice cloning: Use your own trained voice IDs for personalized output.
  • Emotion control: Add emotional expression like happy, sad, or neutral.
  • Voice tuning: Adjust speed, volume, and pitch for perfect delivery.
  • Audio quality options: Configure sample rate, bitrate, and format.

Parameters

ParameterRequiredDescription
textYesThe text you want to convert to speech.
voice_idYesVoice to use — preset ID or custom trained voice.
speedNoSpeech speed multiplier. Default: 1.
volumeNoVolume level. Default: 1.
pitchNoPitch adjustment. Default: 0.
emotionNoEmotional tone: happy, sad, angry, neutral, etc.
english_normalizationNoImproves number-reading in English text.
sample_rateNoAudio sample rate (e.g., 22050, 44100).
bitrateNoAudio bitrate quality.
channelNoAudio channels (mono/stereo).
formatNoOutput format (mp3, wav, etc.).
language_boostNoBoost specific language pronunciation.

Available Preset Voices

Voice IDCharacter
Wise_WomanMature, thoughtful female
Friendly_PersonWarm, approachable
Inspirational_girlMotivating young female
Deep_Voice_ManRich, deep male voice
Calm_WomanSoothing, relaxed female
Casual_GuyLaid-back male
Lively_GirlEnergetic young female
Patient_ManSteady, reassuring male
Young_KnightYouthful, heroic male
Determined_ManStrong, resolute male
Lovely_GirlSweet, pleasant female
Decent_BoyPolite young male
Imposing_MannerAuthoritative presence
Elegant_ManRefined, sophisticated male
AbbessWise, spiritual female
Sweet_Girl_2Gentle, charming female
Exuberant_GirlExcited, enthusiastic female
Energetic_GirlVibrant, dynamic female

How to Use

  1. Enter your text — type or paste the content to convert.
  2. Select voice — choose a preset voice or enter your custom voice ID.
  3. Adjust settings (optional) — tune speed, volume, pitch, and emotion.
  4. Configure audio (optional) — set sample rate, bitrate, and format.
  5. Run — click the button to generate.
  6. Download — preview and save your audio file.

Pricing

Per-character billing based on text length.

Text LengthCost
1,000 characters$0.03
5,000 characters$0.15
10,000 characters$0.30

Best Use Cases

  • Voiceovers — Create professional narration for videos and presentations.
  • Audiobooks — Generate natural-sounding book narration.
  • Content Creation — Add voice to social media videos and podcasts.
  • E-learning — Produce educational audio content at scale.
  • Accessibility — Convert written content to audio format.
  • Character Voices — Create distinct voices for games and animations.

Custom Voice Cloning

Train your own voice for personalized output:

Voice Clone Training

Pro Tips for Best Results

  • Match voice character to content tone — use Calm_Woman for meditation, Energetic_Girl for ads.
  • Use emotion parameter to add expressiveness: "happy" for upbeat, "neutral" for professional.
  • Adjust speed slightly (0.9-1.1) for more natural pacing.
  • Enable english_normalization when text contains numbers or abbreviations.
  • Test different voices with the same text to find the perfect match.
  • For long content, break into paragraphs for more natural pacing.

Notes

  • Pricing is based on character count, not audio duration.
  • Custom voice IDs require prior voice clone training.
  • Processing time scales with text length.
  • Multiple output formats available for different use cases.