
text-to-audio
Idle
Ihre Anfrage kostet $0.12 pro Durchlauf.
Für $10 können Sie dieses Modell ungefähr 83 Mal ausführen.
Microsoft VibeVoice is an advanced multi-speaker text-to-speech model that generates natural conversations between up to 4 speakers. Assign different voices to speakers in your script and the model produces realistic dialogue with natural turn-taking and expression.
Multi-speaker conversations Support up to 4 distinct speakers in a single generation.
Natural dialogue Realistic turn-taking and conversational flow between speakers.
Multilingual voices 9 preset voices across English, Chinese, and Indian languages.
Expression control Adjust voice expressiveness with the scale parameter.
Prompt Enhancer Built-in tool to automatically improve your scripts.
| Parameter | Required | Description |
|---|---|---|
| prompt | Yes | Conversation script with speaker labels |
| speaker_1 | No | Voice for Speaker 0 (default: en-Alice_woman) |
| speaker_2 | No | Voice for Speaker 1 |
| speaker_3 | No | Voice for Speaker 2 |
| speaker_4 | No | Voice for Speaker 3 |
| scale | No | Voice expressiveness (default: 1.3) |
| Voice | Language | Gender |
|---|---|---|
| en-Alice_woman | English | Female |
| en-Carter_man | English | Male |
| en-Frank_man | English | Male |
| en-Mary_woman_bgm | English | Female |
| en-Maya_woman | English | Female |
| in-Samuel_man | Indian | Male |
| zh-Anchen_man_bgm | Chinese | Male |
| zh-Bowen_man | Chinese | Male |
| zh-Xinran_woman | Chinese | Female |
Write conversations using speaker labels. Each line starts with "Speaker N:" followed by the dialogue:
Speaker 1: Hey, have you tried the new VibeVoice model on WaveSpeedAI yet? Speaker 2: Not yet! What's so special about it? Speaker 1: It can generate really natural multi-speaker conversations like this one.
| Output | Cost |
|---|---|
| Per generation | $0.12 |