Google Gemini 2.5 Pro Text-to-Speech delivers natural multi-speaker voice synthesis with 30+ voices across 24 languages. Perfect for dialogues, conversations, and multilingual content. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Inactivo
$0.08por ejecución·~12 / $1
Gemini 2.5 Pro Text-to-Speech is Google's advanced multi-speaker speech synthesis model that turns written dialogue into natural, expressive audio. It supports multiple speakers with distinct voices in a single generation, making it ideal for podcasts, conversations, audiobooks, and any content that needs realistic multi-voice narration.
Multi-speaker dialogue Assign different voices to different speakers and generate a natural-sounding conversation in one pass — no need to stitch separate audio clips together.
Expressive, natural voices Powered by Gemini 2.5 Pro, the voices carry natural intonation, pacing, and emotional range for lifelike results.
Multi-language support Supports a wide range of languages including Arabic (Egypt), Bangla (Bangladesh), Dutch (Netherlands), English (India), English (United States), French (France), German (Germany), Hindi (India), Indonesian (Indonesia), and more.
Flexible speaker setup Add as many speakers as your script needs, each with their own named voice. Simply write dialogue with speaker labels and the model handles the rest.
| Parameter | Required | Description |
|---|---|---|
| text | Yes | The script or dialogue text. Use "Speaker: line" format for multi-speaker content. |
| language | Yes | Language and locale for synthesis (e.g., English (United States), French (France)). |
| speakers | Yes | A list of speaker entries, each with a speaker name and a voice selection. |
$0.08 per 1,000 characters of input text.
| Text Length | Cost |
|---|---|
| 500 characters | $0.08 |
| 1,000 characters | $0.08 |
| 2,500 characters | $0.24 |
| 5,000 characters | $0.40 |
| 10,000 characters | $0.80 |