MiniMax Speech 2.6 Turbo
High-definition Text-to-Speech (TTS) with natural pronunciation and crisp articulation. Supports multiple built-in voices and custom cloned voices, adjustable speed, volume, and pitch, and coverage of 40+ languages for professional audio creation.
Features
- Multilingual leap: Substantially improved English and overall multilingual similarity, accuracy, and rhythm vs. Speech 02; seamless switching across 40 languages for meetings, podcasts, and daily dialog.
- Lifelike tone replication: Cross-language, accent, style, and emotion control with industry-leading nuance—including cross-language accent retention, regional accent preservation, and special age voice replication.
- Global language set (40+): Expanded library including (new adds) Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans, and more—great for cross-border commerce, customer support, and localized marketing.
How to Use
1) Choose a Voice (voice_id)
Use either a custom voice you trained (voice cloning) or a built-in system voice (case-sensitive):
Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman,
Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl,
Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl
- See the full list and samples here:
Voice_ID list
2) Set Audio Parameters (mapped to the UI dropdowns)
- english_normalization (boolean)
Improves English text normalization, especially for number reading (e.g., “$1,299” → “one thousand two hundred ninety-nine dollars”).
- sample_rate (Hz)
Common: 22050, 24000, 44100, 48000.
Tips: 44.1 kHz for music/podcasts; 48 kHz for video post-production.
- bitrate (bps for MP3/OGG)
64k / 96k / 128k / 192k / 256k / 320k.
Tips: ≥192k for distribution; 96–128k for previews.
- channel: mono or stereo
Mono is smaller/clearer for speech; stereo when spatialization is desired.
- format: mp3, wav, ogg, flac, wav is lossless (bigger files); mp3 is compact and web-friendly.
- language_boost (IETF code like en, zh, ja …)
Prioritize the main language in mixed-language inputs.
Prosody controls
- speed: speaking rate (e.g., 0.8–1.2).
- volume: gain (unit depends on API; typically linear or dB).
- pitch: pitch shift (semitones/cents or normalized value).
Price
Price: $0.06 / 1,000 characters
Typical Use Cases
- Short-video and ad voiceovers, e-learning and courseware, AI assistants and IVR, podcasts/audiobooks, cross-border e-commerce localization.
Best-Practice Presets (optional)
- Video voiceover: format=wav, sample_rate=48000, channel=mono, english_normalization=true.
- Web preview: format=mp3, sample_rate=44100, bitrate=128000, channel=mono.
- Podcast: format=mp3, sample_rate=44100, bitrate=192000–320000, channel=stereo if mixing music.