WaveSpeed.ai
Inicio/Explorar/Minmax Hailuo Models/minimax/speech-2.8-turbo
text-to-audio

text-to-audio

MiniMax Speech 2.8 Turbo

minimax/speech-2.8-turbo

MiniMax Speech 2.8 Turbo is a high-definition text-to-speech model with natural and expressive voice synthesis. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Input
Hint: Supported interjections: (laughs), (chuckle), (coughs), (clear-throat), (groans), (breath), (pant), (inhale), (exhale), (gasps), (sniffs), (sighs), (snorts), (burps), (lip-smacking), (humming), (hissing), (emm), (whistles), (sneezes), (crying), (applause).
Hint: .Use this to map tone words in your text (e.g. Omg(...)) to custom pronunciations.
Hint: Desired voice ID. Use a voice ID you have trained (https://wavespeed.ai/models/minimax/voice-clone), or one of the following system voice IDs: Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl
This parameter supports English text normalization, which improves performance in number-reading scenarios.
If enabled, the output will be encoded into a BASE64 string instead of a URL. This property is only available through the API.
If set to true, the function will wait for the result to be generated and uploaded before returning the response. It allows you to get the result directly in the response. This property is only available through the API.

Idle

Tu solicitud costará $0.06 por ejecución.

Con $1 puedes ejecutar este modelo aproximadamente 16 veces.

EjemplosVer todo

README

MiniMax Speech 2.8 Turbo

MiniMax Speech 2.8 Turbo is a high-quality text-to-speech model that transforms written text into natural, expressive audio. With support for multiple voice presets, emotional tones, and fine-grained audio controls, it delivers broadcast-ready speech synthesis for any application.

Why Choose This?

  • Rich voice library Choose from 17+ preset voices spanning different genders, ages, and speaking styles — or use your own custom-trained voice.

  • Expressive interjections Add natural human sounds like (laughs), (sighs), (coughs), (gasps), and more directly in your text for lifelike delivery.

  • Emotion control Set the emotional tone of the speech — happy, calm, or other moods — to match your content.

  • Pronunciation customization Define custom pronunciations for brand names, acronyms, or specialized terms using the pronunciation dictionary.

  • Full audio control Fine-tune speed, volume, pitch, sample rate, bitrate, channel, and output format for production-ready results.

Parameters

ParameterRequiredDescription
textYesThe text to convert to speech. Supports interjections like (laughs), (sighs), (coughs)
voice_idYesVoice preset or custom voice ID (see Available Voices below)
speedNoSpeech speed multiplier (default: 1)
volumeNoVolume level (default: 1)
pitchNoPitch adjustment (default: 0)
emotionNoEmotional tone: happy, calm, etc.
pronunciation_dictNoCustom pronunciation mappings (e.g., Omg/Oh my god)
english_normalizationNoImproves number-reading performance in English text
sample_rateNoAudio sample rate
bitrateNoAudio bitrate
channelNoAudio channel (mono/stereo)
formatNoOutput format
language_boostNoBoost specific language recognition

Available Voices

Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl

You can also use a custom voice ID trained via MiniMax Voice Clone.

Supported Interjections

(laughs), (chuckle), (coughs), (clear-throat), (groans), (breath), (pant), (inhale), (exhale), (gasps), (sniffs), (sighs), (snorts), (burps), (lip-smacking), (humming), (hissing), (emm), (whistles), (sneezes), (crying), (applause)

How to Use

  1. Enter your text — write or paste the content you want to convert to speech.
  2. Select voice_id — choose a preset voice or enter your custom voice ID.
  3. Adjust speech settings (optional) — modify speed, volume, and pitch as needed.
  4. Set emotion (optional) — select the emotional tone for the delivery.
  5. Configure audio output (optional) — choose sample rate, bitrate, channel, and format.
  6. Run — submit and download your audio file.

Pricing

MetricCost
Per 1,000 characters$0.06

Best Use Cases

  • Audiobook Production — Convert manuscripts into natural-sounding narration with expressive voices.
  • Video Voiceovers — Generate professional voiceovers for YouTube, ads, or explainer videos.
  • Podcasts & Broadcasting — Create consistent voice content without recording equipment.
  • E-learning & Training — Produce clear, engaging audio for educational materials.
  • Accessibility — Convert written content to audio for visually impaired users.
  • Game & App Development — Add character voices and UI narration to interactive experiences.

Pro Tips

  • Use interjections sparingly for natural effect — too many can sound unnatural.
  • Match voice_id to your content: use "Deep_Voice_Man" or "Imposing_Manner" for authoritative content, "Lively_Girl" or "Casual_Guy" for friendly content.
  • Enable english_normalization when your text contains numbers, dates, or currencies.
  • Use pronunciation_dict for consistent handling of brand names or technical terms.
  • Start with default speed/pitch settings, then adjust based on your specific use case.

Notes

  • Text length affects processing time and cost — longer texts take more time.
  • For custom voices, train your voice model first via Voice Clone.
  • Interjections must be written in parentheses exactly as listed to be recognized.