Minimax Voice Clone

Playground

MiniMax Voice Clone is a state-of-the-art voice synthesis model developed by MiniMax. It enables high-quality voice cloning from a short reference clip, producing speech that closely mimics the tone, accent, and personality of the original speaker.

Features

MiniMax Voice Clone

MiniMax Voice Clone is a state-of-the-art voice synthesis model developed by MiniMax. It enables high-quality voice cloning from a short reference clip, producing speech that closely mimics the tone, accent, and personality of the original speaker.

Key Features

High-Fidelity Voice Cloning
Generates speech that is perceptually close to the source speaker with natural prosody and pronunciation.
Few-Second Voice Adaptation
Requires only a few seconds of reference audio to accurately replicate a voice.
Emotion and Tone Control
Allows fine-tuned control over speaking style and emotion, useful for storytelling, games, and character dialogue.
Multilingual Output
Supports voice cloning across different languages and smooth code-switching.
Low-Latency Inference
Optimized for real-time use cases, including live interactions and dialogue generation.

Use Cases

AI voiceovers for content creators and influencers
Personalized digital assistants and chatbots
Audiobook narration in a specific voice
Interactive gaming and character voices
Assistive speech for individuals with voice loss

Model Overview

MiniMax Voice Clone uses a neural TTS pipeline with robust speaker embedding and prosody modeling. It combines clarity, control, and speed, offering production-ready results in diverse environments.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/minimax/voice-clone" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "model": "speech-02-hd",
    "need_noise_reduction": false,
    "need_volume_normalization": false,
    "accuracy": 0.7,
    "text": "Hello! Welcome to Wavespeed! This is a preview of your cloned voice. I hope you enjoy it!"
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
audio	string	Yes	-	-	The uploaded file is cloned and supports formats such as MP3, M4A, and WAV.
custom_voice_id	string	Yes	-	-	Custom user-defined ID. Minimum 8 characters; must include letters and numbers and start with a letter (e.g., WaveSpeed001). Duplicate voice-ids will throw an error.
model	string	Yes	speech-02-hd	-	Specify the TTS model to be used for the preview.
need_noise_reduction	boolean	No	false	-	Enable noise reduction. Default is false (no noise reduction).
need_volume_normalization	boolean	No	false	-	Specify whether to enable volume normalization. If not provided, the default value is false.
accuracy	number	No	0.7	0.00 ~ 1.00	Uploading this parameter will set the text validation accuracy threshold, with a value range of [0,1]. If not provided, the default value for this parameter is 0.7.
text	string	No	Hello! Welcome to Wavespeed! This is a preview of your cloned voice. I hope you enjoy it!	-	The model will generate audio for the given text using the cloned voice and provide the result as a link for previewing the voice cloning effect. Limited to 2000 characters.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Query Parameters

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Minimax Speech 02 Turbo Real ESRGAN