
text-to-audio
Idle
이 요청에는 $0.08 실행당가 필요합니다.
$1으로 이 모델을 약 12회 실행할 수 있습니다.
Gemini 2.5 Pro Text-to-Speech is Google's advanced multi-speaker speech synthesis model that turns written dialogue into natural, expressive audio. It supports multiple speakers with distinct voices in a single generation, making it ideal for podcasts, conversations, audiobooks, and any content that needs realistic multi-voice narration.
Multi-speaker dialogue Assign different voices to different speakers and generate a natural-sounding conversation in one pass — no need to stitch separate audio clips together.
Expressive, natural voices Powered by Gemini 2.5 Pro, the voices carry natural intonation, pacing, and emotional range for lifelike results.
Multi-language support Supports a wide range of languages including Arabic (Egypt), Bangla (Bangladesh), Dutch (Netherlands), English (India), English (United States), French (France), German (Germany), Hindi (India), Indonesian (Indonesia), and more.
Flexible speaker setup Add as many speakers as your script needs, each with their own named voice. Simply write dialogue with speaker labels and the model handles the rest.
| Parameter | Required | Description |
|---|---|---|
| text | Yes | The script or dialogue text. Use "Speaker: line" format for multi-speaker content. |
| language | Yes | Language and locale for synthesis (e.g., English (United States), French (France)). |
| speakers | Yes | A list of speaker entries, each with a speaker name and a voice selection. |
$0.08 per 1,000 characters of input text.
| Text Length | Cost |
|---|---|
| 500 characters | $0.08 |
| 1,000 characters | $0.08 |
| 2,500 characters | $0.24 |
| 5,000 characters | $0.40 |
| 10,000 characters | $0.80 |