Browse ModelsWavespeed AIQwen3 Tts Voice Clone

Qwen3 Tts Voice Clone

Qwen3 Tts Voice Clone

Playground

Try it on WavespeedAI!

Qwen3 TTS Voice Clone: Clone any voice from a reference audio and generate speech in that voice. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

Features

Qwen3-TTS Voice Clone

Qwen3-TTS Voice Clone is an advanced text-to-speech model that clones voices from reference audio. Upload a short audio sample of any voice, and the model generates new speech in that exact voice — preserving tone, accent, and speaking style.


Why Choose This?

  • High-fidelity voice cloning Capture the unique characteristics of any voice from just a short audio sample.

  • Reference transcript support Provide the transcript of your reference audio to improve cloning accuracy.

  • Multilingual support Generate cloned voice speech in 10 languages: Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, and Russian.

  • Auto language detection Set language to “auto” and the model intelligently detects the language from your text.


Parameters

ParameterRequiredDescription
audioYesReference audio file to clone (upload or URL)
textYesThe text to convert to speech in the cloned voice
reference_textNoTranscript of the reference audio (improves accuracy)
languageNoauto, Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian (default: auto)

How to Use

  1. Upload reference audio — provide a clear audio sample of the voice you want to clone (3-15 seconds recommended).
  2. Add reference transcript (optional) — enter the exact text spoken in your reference audio to improve cloning accuracy.
  3. Enter your text — write or paste the content you want to convert to speech.
  4. Select language — choose the target language or use “auto” for automatic detection.
  5. Run — submit and download your audio file.

Pricing

Text LengthCost
Under 1,000 chars$0.02
1,000+ chars$0.02 per 1,000 characters

Billing Rules

  • Minimum charge: $0.02 (for texts under 1,000 characters)
  • For longer texts: $0.02 × (character count / 1,000)

Best Use Cases

  • Personalized Voiceovers — Clone your own voice to generate content without recording.
  • Character Consistency — Maintain the same voice across multiple audio productions.
  • Localization — Clone a voice to speak in different languages while preserving identity.
  • Audiobook Production — Generate hours of narration from a single voice sample.
  • Accessibility — Create personalized text-to-speech voices for individuals.

Pro Tips

  • Use clean, noise-free reference audio for best cloning results.
  • Reference audio of 3-15 seconds with clear speech works best.
  • Always provide reference_text when possible — it significantly improves voice matching accuracy.
  • Ensure the reference audio contains natural speech without music or background noise.
  • The cloned voice works best when the target text matches the reference audio’s language.

  • Qwen3-TTS Voice Design — Design custom voices using natural language descriptions instead of audio samples.

Notes

  • Reference audio quality directly affects cloning quality — use high-quality recordings.
  • The model preserves accent, tone, and speaking style from the reference.
  • For best results, match the language parameter to your text content.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/qwen3-tts/voice-clone" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "language": "auto"
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
audiostringYes--URL of the reference audio to clone the voice from
reference_textstringNo--Transcript of the reference audio (optional, improves accuracy)
textstringYes--The text content to convert into speech using the cloned voice
languagestringNoautoauto, Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, RussianLanguage of the speech output (use 'auto' for automatic detection)

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

ParameterTypeRequiredDefaultDescription
idstringYes-Task ID

Result Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
dataobjectThe prediction data object containing all details
data.idstringUnique identifier for the prediction, the ID of the prediction to get
data.modelstringModel ID used for the prediction
data.outputsstringArray of URLs to the generated content (empty when status is not completed).
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds
© 2025 WaveSpeedAI. All rights reserved.