Minimax Speech 02 Turbo

Playground

MiniMax's high-definition text-to-speech model, Your request will cost $0.03 per 1000 characters.

Features

MiniMax's high-definition text-to-speech model with natural pronunciation and clear articulation. Features multiple voice options, adjustable speed, volume, and pitch controls for professional-grade audio generation.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/minimax/speech-02-turbo" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "text": "Hello world! This is a test of the text-to-speech system.",
    "speed": 1,
    "volume": 1,
    "pitch": 0,
    "emotion": "happy",
    "english_normalization": false
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
text	string	Yes	Hello world! This is a test of the text-to-speech system.	-	Text to convert to speech. Every character is 1 token. Maximum 5000 characters. Use <#x#> between words to control pause duration (0.01-99.99s).
voice_id	string	Yes	-	-	Desired voice ID. Use a voice ID you have trained (https://wavespeed.ai/models/minimax/voice-clone), or one of the following system voice IDs: Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl.
speed	number	No	1	0.50 ~ 2.00	Speech speed. Range: 0.5-2.0, where 1.0 is normal speed.
volume	number	No	1	0.10 ~ 10.00	Speech volume. Range: 0.1-10.0, where 1.0 is normal volume.
pitch	number	No	-	-12 ~ 12	Speech pitch. Range: -12 to 12, where 0 is normal pitch.
emotion	string	No	happy	-	The emotion of the generated speech.
english_normalization	boolean	No	false	-	This parameter supports English text normalization, which improves performance in number-reading scenarios.
sample_rate	integer	No	-	-	Sample rate of generated sound.
bitrate	integer	No	-	-	Bitrate of generated sound.
channel	string	No	-	-	The number of channels of the generated audio. 1: mono, 2: stereo.
language_boost	string	No	-	-	Enhance the ability to recognize specified languages and dialects.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Query Parameters

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Minimax Speech 02 HD Minimax Voice Clone