
text-to-audio
Idle
Your request will cost $0.5 per run.
For $10 you can run this model approximately 20 times.
MiniMax Voice Design is a state-of-the-art voice synthesis model developed by MiniMax. Instead of cloning a voice from a reference audio, it generates high-quality voices based on your textual voice description, allowing you to create speech with the desired tone, accent, and personality.
High-Fidelity Voice Generation
Produces speech that matches your description with natural prosody and pronunciation.
Flexible Voice Design
Create a wide range of voices by simply describing the desired characteristics—no reference audio required.
Emotion and Tone Control
Fine-tune speaking style and emotion for storytelling, games, and character dialogue.
Multilingual Output
Supports voice design across different languages and smooth code-switching.
Low-Latency Inference
Optimized for real-time use cases, including live interactions and dialogue generation.
MiniMax Voice Design uses a neural TTS pipeline with robust speaker and prosody modeling. By leveraging your textual description, it offers clarity, control, and speed, delivering production-ready results in diverse environments.
Your custom voice ID must be used at least once with one of the voice models on our platform to be saved permanently. Such as:
Otherwise, we can only store it for 7 days. After that, it will be deleted and the voice ID will no longer be callable.
For easier reuse later, please make sure to use your voice ID once in one of the models above after creating it.