Qwen3-TTS Voice Clone
Qwen3-TTS Voice Clone is an advanced text-to-speech model that clones voices from reference audio. Upload a short audio sample of any voice, and the model generates new speech in that exact voice — preserving tone, accent, and speaking style.
Why Choose This?
-
High-fidelity voice cloning
Capture the unique characteristics of any voice from just a short audio sample.
-
Reference transcript support
Provide the transcript of your reference audio to improve cloning accuracy.
-
Multilingual support
Generate cloned voice speech in 10 languages: Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, and Russian.
-
Auto language detection
Set language to "auto" and the model intelligently detects the language from your text.
Parameters
| Parameter | Required | Description |
|---|
| audio | Yes | Reference audio file to clone (upload or URL) |
| text | Yes | The text to convert to speech in the cloned voice |
| reference_text | No | Transcript of the reference audio (improves accuracy) |
| language | No | auto, Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian (default: auto) |
How to Use
- Upload reference audio — provide a clear audio sample of the voice you want to clone (3-15 seconds recommended).
- Add reference transcript (optional) — enter the exact text spoken in your reference audio to improve cloning accuracy.
- Enter your text — write or paste the content you want to convert to speech.
- Select language — choose the target language or use "auto" for automatic detection.
- Run — submit and download your audio file.
Pricing
| Text Length | Cost |
|---|
| Under 1,000 chars | $0.02 |
| 1,000+ chars | $0.02 per 1,000 characters |
Billing Rules
- Minimum charge: $0.02 (for texts under 1,000 characters)
- For longer texts: $0.02 × (character count / 1,000)
Best Use Cases
- Personalized Voiceovers — Clone your own voice to generate content without recording.
- Character Consistency — Maintain the same voice across multiple audio productions.
- Localization — Clone a voice to speak in different languages while preserving identity.
- Audiobook Production — Generate hours of narration from a single voice sample.
- Accessibility — Create personalized text-to-speech voices for individuals.
Pro Tips
- Use clean, noise-free reference audio for best cloning results.
- Reference audio of 3-15 seconds with clear speech works best.
- Always provide reference_text when possible — it significantly improves voice matching accuracy.
- Ensure the reference audio contains natural speech without music or background noise.
- The cloned voice works best when the target text matches the reference audio's language.
Related Models
- Qwen3-TTS Voice Design — Design custom voices using natural language descriptions instead of audio samples.
Notes
- Reference audio quality directly affects cloning quality — use high-quality recordings.
- The model preserves accent, tone, and speaking style from the reference.
- For best results, match the language parameter to your text content.