WaveSpeedAI APIMinimax Voice Clone

Minimax Voice Clone

Playground

Try it on WavespeedAI!

MiniMax Voice Clone is a state-of-the-art voice synthesis model developed by MiniMax. It enables high-quality voice cloning from a short reference clip, producing speech that closely mimics the tone, accent, and personality of the original speaker.

Features

MiniMax Voice Clone

MiniMax Voice Clone is a state-of-the-art voice synthesis model developed by MiniMax. It enables high-quality voice cloning from a short reference clip, producing speech that closely mimics the tone, accent, and personality of the original speaker.

Key Features

  • High-Fidelity Voice Cloning
    Generates speech that is perceptually close to the source speaker with natural prosody and pronunciation.

  • Few-Second Voice Adaptation
    Requires only a few seconds of reference audio to accurately replicate a voice.

  • Emotion and Tone Control
    Allows fine-tuned control over speaking style and emotion, useful for storytelling, games, and character dialogue.

  • Multilingual Output
    Supports voice cloning across different languages and smooth code-switching.

  • Low-Latency Inference
    Optimized for real-time use cases, including live interactions and dialogue generation.

Use Cases

  • AI voiceovers for content creators and influencers
  • Personalized digital assistants and chatbots
  • Audiobook narration in a specific voice
  • Interactive gaming and character voices
  • Assistive speech for individuals with voice loss

Model Overview

MiniMax Voice Clone uses a neural TTS pipeline with robust speaker embedding and prosody modeling. It combines clarity, control, and speed, offering production-ready results in diverse environments.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/minimax/voice-clone" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "model": "speech-02-hd",
    "need_noise_reduction": false,
    "need_volume_normalization": false,
    "accuracy": 0.7,
    "text": "Hello! Welcome to Wavespeed! This is a preview of your cloned voice. I hope you enjoy it!"
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
audiostringYes--The uploaded file is cloned and supports formats such as MP3, M4A, and WAV.
custom_voice_idstringYes--Custom user-defined ID. Minimum 8 characters; must include letters and numbers and start with a letter (e.g., WaveSpeed001). Duplicate voice-ids will throw an error.
modelstringYesspeech-02-hd-Specify the TTS model to be used for the preview.
need_noise_reductionbooleanNofalse-Enable noise reduction. Default is false (no noise reduction).
need_volume_normalizationbooleanNofalse-Specify whether to enable volume normalization. If not provided, the default value is false.
accuracynumberNo0.70.00 ~ 1.00Uploading this parameter will set the text validation accuracy threshold, with a value range of [0,1]. If not provided, the default value for this parameter is 0.7.
textstringNoHello! Welcome to Wavespeed! This is a preview of your cloned voice. I hope you enjoy it!-The model will generate audio for the given text using the cloned voice and provide the result as a link for previewing the voice cloning effect. Limited to 2000 characters.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Query Parameters

Result Request Parameters

ParameterTypeRequiredDefaultDescription
idstringYes-Task ID

Result Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
dataobjectThe prediction data object containing all details
data.idstringUnique identifier for the prediction
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds
© 2025 WaveSpeedAI. All rights reserved.