
text-to-audio
Idle
Votre requête coûtera $0.1 par exécution.
Pour $1 vous pouvez exécuter ce modèle environ 10 fois.
MiniMax Speech 2.8 HD is a premium text-to-speech model delivering studio-quality audio with enhanced clarity and naturalness. With support for multiple voice presets, emotional tones, and fine-grained audio controls, it produces broadcast-ready speech synthesis for professional applications.
For a faster, more cost-effective option, try MiniMax Speech 2.8 Turbo.
Studio-grade audio quality HD processing delivers richer, cleaner audio with improved naturalness compared to the Turbo version.
Rich voice library Choose from 17+ preset voices spanning different genders, ages, and speaking styles — or use your own custom-trained voice.
Expressive interjections Add natural human sounds like (laughs), (sighs), (coughs), (gasps), and more directly in your text for lifelike delivery.
Emotion control Set the emotional tone of the speech — happy, calm, or other moods — to match your content.
Pronunciation customization Define custom pronunciations for brand names, acronyms, or specialized terms using the pronunciation dictionary.
Full audio control Fine-tune speed, volume, pitch, sample rate, bitrate, channel, and output format for production-ready results.
| Parameter | Required | Description |
|---|---|---|
| text | Yes | The text to convert to speech. Supports interjections like (laughs), (sighs), (coughs) |
| voice_id | Yes | Voice preset or custom voice ID (see Available Voices below) |
| speed | No | Speech speed multiplier (default: 1) |
| volume | No | Volume level (default: 1) |
| pitch | No | Pitch adjustment (default: 0) |
| emotion | No | Emotional tone: happy, calm, etc. |
| pronunciation_dict | No | Custom pronunciation mappings (e.g., Omg/Oh my god) |
| english_normalization | No | Improves number-reading performance in English text |
| sample_rate | No | Audio sample rate |
| bitrate | No | Audio bitrate |
| channel | No | Audio channel (mono/stereo) |
| format | No | Output format |
| language_boost | No | Boost specific language recognition |
Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl
You can also use a custom voice ID trained via MiniMax Voice Clone.
(laughs), (chuckle), (coughs), (clear-throat), (groans), (breath), (pant), (inhale), (exhale), (gasps), (sniffs), (sighs), (snorts), (burps), (lip-smacking), (humming), (hissing), (emm), (whistles), (sneezes), (crying), (applause)
| Metric | Cost |
|---|---|
| Per 1,000 characters | $0.10 |