
text-to-audio
Idle
Sua solicitação custará $0.04 por execução.
Por $1 você pode executar este modelo aproximadamente 25 vezes.
Gemini 2.5 Flash Text-to-Speech is Google's fast, cost-efficient multi-speaker speech synthesis model. It turns written dialogue into natural, expressive audio with support for multiple speakers and distinct voices in a single generation — at half the cost of the Pro version. Ideal for high-volume TTS workflows like podcasts, conversations, audiobooks, and voiceover production.
Fast and affordable Optimized for speed and cost-efficiency, delivering natural speech at half the price of Gemini 2.5 Pro TTS.
Multi-speaker dialogue Assign different voices to different speakers and generate a natural-sounding conversation in one pass — no need to stitch separate audio clips together.
Expressive, natural voices The voices carry natural intonation, pacing, and emotional range for lifelike results.
Multi-language support Supports a wide range of languages including Arabic (Egypt), Bangla (Bangladesh), Dutch (Netherlands), English (India), English (United States), French (France), German (Germany), Hindi (India), Indonesian (Indonesia), and more.
Flexible speaker setup Add as many speakers as your script needs, each with their own named voice. Simply write dialogue with speaker labels and the model handles the rest.
| Parameter | Required | Description |
|---|---|---|
| text | Yes | The script or dialogue text. Use "Speaker: line" format for multi-speaker content. |
| language | Yes | Language and locale for synthesis (e.g., English (United States), French (France)). |
| speakers | Yes | A list of speaker entries, each with a speaker name and a voice selection. |
$0.04 per 1,000 characters of input text.
| Text Length | Cost |
|---|---|
| 500 characters | $0.04 |
| 1,000 characters | $0.04 |
| 2,500 characters | $0.12 |
| 5,000 characters | $0.20 |
| 10,000 characters | $0.40 |