WaveSpeed.ai
首页/探索/Speech Generation/wavespeed-ai/qwen3-tts/voice-design
text-to-audio

text-to-audio

Qwen3 TTS Voice Design

wavespeed-ai/qwen3-tts/voice-design

Qwen3 TTS Voice Design: Generate speech with custom voice characteristics described in natural language. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

Input

Idle

您的请求将花费 $0.02 每次运行。

使用 $1 您可以运行此模型大约 50 次。

示例查看全部

README

Qwen3-TTS Voice Design

Qwen3-TTS Voice Design is a next-generation text-to-speech model that lets you design custom voices using natural language descriptions. Instead of selecting from preset voices, simply describe the voice you want — age, gender, tone, speaking style — and the model generates speech that matches your description.

Why Choose This?

  • Natural language voice control Describe your ideal voice in plain text (e.g., "a warm, friendly female voice with a slight British accent") and the model creates it.

  • Unlimited voice variety No preset limits — create any voice character you can describe, from professional narrators to unique personas.

  • Multilingual support Generate speech in 10 languages: Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, and Russian.

  • Auto language detection Set language to "auto" and the model intelligently detects the language from your text.

Parameters

ParameterRequiredDescription
textYesThe text to convert to speech
voice_descriptionYesNatural language description of the desired voice
languageNoauto, Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian (default: auto)

Voice Description Examples

  • "A young female voice, energetic and cheerful, speaking quickly with enthusiasm"
  • "An elderly male narrator with a deep, calm, authoritative tone"
  • "A professional newsreader voice, neutral and clear, with perfect pronunciation"
  • "A warm, friendly customer service representative, patient and helpful"
  • "A dramatic storyteller voice with expressive intonation and theatrical pauses"

How to Use

  1. Enter your text — write or paste the content you want to convert to speech.
  2. Describe your voice — use natural language to describe the voice characteristics you want (age, gender, tone, style, accent, etc.).
  3. Select language — choose the target language or use "auto" for automatic detection.
  4. Run — submit and download your audio file.

Pricing

Text LengthCost
Under 1,000 chars$0.02
1,000+ chars$0.02 per 1,000 characters

Billing Rules

  • Minimum charge: $0.02 (for texts under 1,000 characters)
  • For longer texts: $0.02 × (character count / 1,000)

Best Use Cases

  • Character Voices — Create unique voices for games, animations, or audiobooks without voice actors.
  • Prototyping — Quickly test different voice styles before committing to production.
  • Localization — Generate consistent voice styles across multiple languages.
  • Accessibility — Convert text to speech with customized, natural-sounding voices.
  • Content Creation — Produce voiceovers for videos, podcasts, and presentations.

Pro Tips

  • Be specific in your voice description — include age, gender, emotional tone, speaking pace, and any accent preferences.
  • Use descriptive adjectives: "warm", "crisp", "authoritative", "playful", "soothing", etc.
  • Mention the context if relevant (e.g., "suitable for a children's audiobook" or "professional corporate presentation").
  • Test with short text first to fine-tune your voice description before generating longer content.
  • Combine multiple attributes for more nuanced voices (e.g., "middle-aged, confident but approachable").

Notes

  • Voice descriptions work best when they are clear and detailed.
  • The same voice description will produce consistent results across multiple generations.
  • For best quality, match the language parameter to your text content.