Home/Explore/Minmax Hailuo Models/minimax/voice-clone
text-to-audio

text-to-audio

Minimax Voice Clone

minimax/voice-clone

Minimax Voice Clone creates high-quality voice clones from short reference clips, closely matching tone, accent, and speaking style. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Hint: You can drag and drop a file or click to upload

Hint: Custom user-defined ID: Must be at least 8 characters long, starting with a letter, and include both letters and numbers (e.g., WaveSpeed-20250717-1050). Duplicate voice IDs will result in an error. This ID can be used with the following models:minimax/speech-02-hd minimax/speech-02-turbo
Enable noise reduction. Default is false (no noise reduction).
Specify whether to enable volume normalization. If not provided, the default value is false.

Idle

Your request will cost $0.5 per run.

For $10 you can run this model approximately 20 times.

ExamplesView all

README

MiniMax Voice Clone

MiniMax Voice Clone is a state-of-the-art voice synthesis and cloning pipeline from MiniMax. It turns a short reference clip into a reusable voice ID, then uses MiniMax Speech models to generate speech that closely matches the speaker’s timbre, accent, and style. The system is built on the MiniMax Speech-02 and Speech-2.6 families, which deliver high-fidelity, multilingual, low-latency TTS for production use.

Now we also supports MiniMax’s latest generation models: Speech 2.6 HD and Speech 2.6 Turbo.

Key Features

  • High-Fidelity Voice Cloning Generates speech that is perceptually very close to the reference speaker, with natural prosody, clear pronunciation, and stable timbre across long passages.

  • Few-Second Voice Adaptation Uses a learnable speaker encoder to extract timbre from just a few seconds of audio, enabling fast, zero-/one-shot voice cloning without transcription.

  • Emotion and Style Control Exposes parameters for speaking rate, pitch, loudness, and emotion, making it suitable for storytelling, dialogue, gaming characters, and branded voices.

  • Multilingual & Cross-Lingual Output Supports dozens of languages (30+ in Speech-02 and 40+ in Speech-2.6 on WaveSpeedAI), with robust accent control and smooth code-switching between languages.

  • Low-Latency Inference Speech-02-Turbo and Speech-2.6-Turbo are optimized for real-time scenarios, with end-to-end latency in the sub-second range and < 250 ms reported for 2.6 in typical interactive settings.

Use Cases

  • AI voiceovers for YouTube, TikTok, and other content platforms
  • Personalized digital assistants and customer-service bots
  • Audiobook and podcast narration in a specific, consistent voice
  • In-game characters, VTubers, and interactive story experiences
  • Assistive speech applications for users who have lost or cannot safely use their natural voice

Model Overview

MiniMax Voice Clone is built around a neural TTS pipeline with:

  • A speaker encoder that extracts a compact voice embedding from a short reference clip
  • A text-to-audio generator (Speech-02 / Speech-2.6 HD or Turbo) that conditions on both text and the voice embedding
  • Optional controls for language, pace, pitch, and emotion

This design combines the clarity of studio-grade TTS with flexible voice cloning, making it suitable for both offline content production and real-time agents.

How to Use

  • Upload or paste your reference audio

    • In the audio field, upload a short, clean voice clip (or paste a direct URL). Around 5–20 seconds of speech without background music works best.
  • Set custom_voice_id

    • Choose a new, descriptive ID (for example: Alice-001).

    • This ID must be unique across your account.

    • If you reuse an existing ID when creating a new clone, the request will fail with a “voice clone voice id duplicate” error.

  • Select the speech model: Such as speech-02-hd.

  • Enter the output text

    • In the text field, type what you want the cloned voice to say.

    Example: “Hello! Welcome to WaveSpeedAI. This is a preview of your cloned voice.”

  • Run the job

    • After it finishes, you can replay and download the audio.

Optional: Enable enhancements

  • Turn on need_noise_reduction if your reference audio has background noise.

  • Turn on need_volume_normalization to even out volume differences.

  • Adjust the accuracy slider if available: higher values make cloning closer to the reference, lower values make it more forgiving to noisy audio.

The custom_voice_id you used is now available for reuse in the supported MiniMax speech models.

Price

  • Just $0.5 per run!

Supported Speech Models on WaveSpeedAI

Your cloned voice IDs can be used directly with the following MiniMax speech models on WaveSpeedAI:

Voice ID Persistence (Important)

To keep your cloned voice reusable in the long term:

  • Any new voice ID must be used at least once with one of the MiniMax speech models above (02 HD/Turbo or 2.6 HD/Turbo).
  • If a voice ID is created but never used in a speech generation request, WaveSpeedAI can only retain it for 7 days. After 7 days of inactivity, the ID and its associated embedding are deleted and can no longer be called from our API.