
text-to-audio
Idle
Your request will cost $0.5 per run.
For $10 you can run this model approximately 20 times.
MiniMax Voice Clone is a state-of-the-art voice synthesis and cloning pipeline from MiniMax. It turns a short reference clip into a reusable voice ID, then uses MiniMax Speech models to generate speech that closely matches the speaker’s timbre, accent, and style. The system is built on the MiniMax Speech-02 and Speech-2.6 families, which deliver high-fidelity, multilingual, low-latency TTS for production use.
Now we also supports MiniMax’s latest generation models: Speech 2.6 HD and Speech 2.6 Turbo.
High-Fidelity Voice Cloning Generates speech that is perceptually very close to the reference speaker, with natural prosody, clear pronunciation, and stable timbre across long passages.
Few-Second Voice Adaptation Uses a learnable speaker encoder to extract timbre from just a few seconds of audio, enabling fast, zero-/one-shot voice cloning without transcription.
Emotion and Style Control Exposes parameters for speaking rate, pitch, loudness, and emotion, making it suitable for storytelling, dialogue, gaming characters, and branded voices.
Multilingual & Cross-Lingual Output Supports dozens of languages (30+ in Speech-02 and 40+ in Speech-2.6 on WaveSpeedAI), with robust accent control and smooth code-switching between languages.
Low-Latency Inference Speech-02-Turbo and Speech-2.6-Turbo are optimized for real-time scenarios, with end-to-end latency in the sub-second range and < 250 ms reported for 2.6 in typical interactive settings.
MiniMax Voice Clone is built around a neural TTS pipeline with:
This design combines the clarity of studio-grade TTS with flexible voice cloning, making it suitable for both offline content production and real-time agents.
Upload or paste your reference audio
Set custom_voice_id
Choose a new, descriptive ID (for example: Alice-001).
This ID must be unique across your account.
If you reuse an existing ID when creating a new clone, the request will fail with a “voice clone voice id duplicate” error.
Select the speech model: Such as speech-02-hd.
Enter the output text
Example: “Hello! Welcome to WaveSpeedAI. This is a preview of your cloned voice.”
Run the job
Turn on need_noise_reduction if your reference audio has background noise.
Turn on need_volume_normalization to even out volume differences.
Adjust the accuracy slider if available: higher values make cloning closer to the reference, lower values make it more forgiving to noisy audio.
The custom_voice_id you used is now available for reuse in the supported MiniMax speech models.
Your cloned voice IDs can be used directly with the following MiniMax speech models on WaveSpeedAI:
To keep your cloned voice reusable in the long term: