Alibaba Qwen3 TTS Flash | Low-Latency Text-To-Speech English/Chinese

Alibaba Qwen3 TTS Flash — Fast Text-to-Speech

Qwen3 TTS Flash is Alibaba's low-latency, natural-sounding Text-to-Speech model that supports English and Chinese with multiple voice styles. It is designed for real-time conversations, product narration, and short-form video dubbing.

Highlights

Low latency / high concurrency for real-time interaction
Multi-language / multi-style voices (English/Chinese priority)
Parameter control: speed, pitch, volume, speaker (voice_id), emotion
Production-ready: stable output, easy integration, common audio formats

Input & Parameters

text (string, required): The text to synthesize (recommended < 2000 characters per request)
voice_id (string, optional): Voice style ID (e.g., qwen-female-1, qwen-male-1; see platform docs for the full list)
language (string, optional): Language code (en, zh)
speed (number, optional): Speaking rate, default 1.0 (range 0.5–2.0)
pitch (number, optional): Pitch adjustment, default 0
volume (number, optional): Output gain, default 0
emotion (string, optional): Voice emotion/style, e.g., neutral, happy, sad
sample_rate (int, optional): Sample rate, default 22050 (e.g., 16000/22050/24000/44100)
format (string, optional): Output format, default mp3 (supports mp3, wav, ogg)

Note: The available speakers and parameter ranges depend on the platform configuration.

Pricing

Formula: total_price = base_price * text_length / 1000
Current base_price: 1000 (unit depends on platform configuration)

Example

{ "model": "alibaba/qwen3-tts-flash", "input": { "text": "Hello, welcome to WaveSpeedAI!", "voice_id": "qwen-female-1", "language": "en", "speed": 1.0, "format": "mp3" } }

Use Cases

Real-time conversational agents / voice replies
Short-form video, advertising, and e-commerce dubbing
App/IoT voice prompts and announcements
Education, customer service, and knowledge base narration

Alibaba Qwen3 TTS Flash: Low-latency Text-to-Speech for English and Chinese with multiple voices, ideal for real-time dialogue. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

ExamplesView all

README