WaveSpeed.ai
Início/Explorar/Speech Generation/wavespeed-ai/qwen3-tts/voice-clone
audio-to-audio

audio-to-audio

Qwen3 TTS Voice Clone

wavespeed-ai/qwen3-tts/voice-clone

Qwen3 TTS Voice Clone: Clone any voice from a reference audio and generate speech in that voice. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

Input

Hint: You can drag and drop a file or click to upload

Idle

Sua solicitação custará $0.02 por execução.

Por $1 você pode executar este modelo aproximadamente 50 vezes.

ExemplosVer todos

README

Qwen3-TTS Voice Clone

Qwen3-TTS Voice Clone is an advanced text-to-speech model that clones voices from reference audio. Upload a short audio sample of any voice, and the model generates new speech in that exact voice — preserving tone, accent, and speaking style.

Why Choose This?

  • High-fidelity voice cloning Capture the unique characteristics of any voice from just a short audio sample.

  • Reference transcript support Provide the transcript of your reference audio to improve cloning accuracy.

  • Multilingual support Generate cloned voice speech in 10 languages: Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, and Russian.

  • Auto language detection Set language to "auto" and the model intelligently detects the language from your text.

Parameters

ParameterRequiredDescription
audioYesReference audio file to clone (upload or URL)
textYesThe text to convert to speech in the cloned voice
reference_textNoTranscript of the reference audio (improves accuracy)
languageNoauto, Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian (default: auto)

How to Use

  1. Upload reference audio — provide a clear audio sample of the voice you want to clone (3-15 seconds recommended).
  2. Add reference transcript (optional) — enter the exact text spoken in your reference audio to improve cloning accuracy.
  3. Enter your text — write or paste the content you want to convert to speech.
  4. Select language — choose the target language or use "auto" for automatic detection.
  5. Run — submit and download your audio file.

Pricing

Text LengthCost
Under 1,000 chars$0.02
1,000+ chars$0.02 per 1,000 characters

Billing Rules

  • Minimum charge: $0.02 (for texts under 1,000 characters)
  • For longer texts: $0.02 × (character count / 1,000)

Best Use Cases

  • Personalized Voiceovers — Clone your own voice to generate content without recording.
  • Character Consistency — Maintain the same voice across multiple audio productions.
  • Localization — Clone a voice to speak in different languages while preserving identity.
  • Audiobook Production — Generate hours of narration from a single voice sample.
  • Accessibility — Create personalized text-to-speech voices for individuals.

Pro Tips

  • Use clean, noise-free reference audio for best cloning results.
  • Reference audio of 3-15 seconds with clear speech works best.
  • Always provide reference_text when possible — it significantly improves voice matching accuracy.
  • Ensure the reference audio contains natural speech without music or background noise.
  • The cloned voice works best when the target text matches the reference audio's language.

Related Models

  • Qwen3-TTS Voice Design — Design custom voices using natural language descriptions instead of audio samples.

Notes

  • Reference audio quality directly affects cloning quality — use high-quality recordings.
  • The model preserves accent, tone, and speaking style from the reference.
  • For best results, match the language parameter to your text content.