Home/Explore/alibaba/qwen3-tts-flash

text-to-audio

alibaba/qwen3-tts-flash

Alibaba Qwen3 TTS Flash is a low-latency Text-to-Speech (TTS) model that supports English and Chinese with multiple voice styles. It is ideal for real-time voice interaction, product narration, and short‑form video dubbing.

Doc

Idle

Your request will cost $0.01 per run.

For $1 you can run this model approximately 100 times.

ExamplesView all

README

Alibaba Qwen3 TTS Flash — Fast Text-to-Speech

Qwen3 TTS Flash is Alibaba's low-latency, natural-sounding Text-to-Speech model that supports English and Chinese with multiple voice styles. It is designed for real-time conversations, product narration, and short-form video dubbing.

Highlights

  • Low latency / high concurrency for real-time interaction
  • Multi-language / multi-style voices (English/Chinese priority)
  • Parameter control: speed, pitch, volume, speaker (voice_id), emotion
  • Production-ready: stable output, easy integration, common audio formats

Input & Parameters

  • text (string, required): The text to synthesize (recommended < 2000 characters per request)
  • voice_id (string, optional): Voice style ID (e.g., qwen-female-1, qwen-male-1; see platform docs for the full list)
  • language (string, optional): Language code (en, zh)
  • speed (number, optional): Speaking rate, default 1.0 (range 0.5–2.0)
  • pitch (number, optional): Pitch adjustment, default 0
  • volume (number, optional): Output gain, default 0
  • emotion (string, optional): Voice emotion/style, e.g., neutral, happy, sad
  • sample_rate (int, optional): Sample rate, default 22050 (e.g., 16000/22050/24000/44100)
  • format (string, optional): Output format, default mp3 (supports mp3, wav, ogg)

Note: The available speakers and parameter ranges depend on the platform configuration.

Pricing

  • Formula: total_price = base_price * text_length / 1000
  • Current base_price: 1000 (unit depends on platform configuration)

Example

{ "model": "alibaba/qwen3-tts-flash", "input": { "text": "Hello, welcome to WaveSpeedAI!", "voice_id": "qwen-female-1", "language": "en", "speed": 1.0, "format": "mp3" } }

Use Cases

  • Real-time conversational agents / voice replies
  • Short-form video, advertising, and e-commerce dubbing
  • App/IoT voice prompts and announcements
  • Education, customer service, and knowledge base narration