Google Gemini 2.5 Flash Text To Speech

Playground

Google Gemini 2.5 Flash Text-to-Speech delivers fast, natural multi-speaker voice synthesis with 30+ voices across 24 languages at lower cost. Perfect for dialogues, conversations, and multilingual content. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

Gemini 2.5 Flash Text-to-Speech

Gemini 2.5 Flash Text-to-Speech is Google’s fast, cost-efficient multi-speaker speech synthesis model. It turns written dialogue into natural, expressive audio with support for multiple speakers and distinct voices in a single generation — at half the cost of the Pro version. Ideal for high-volume TTS workflows like podcasts, conversations, audiobooks, and voiceover production.

Need higher quality? Try Gemini 2.5 Pro Text-to-Speech

Why Choose This?

Fast and affordable Optimized for speed and cost-efficiency, delivering natural speech at half the price of Gemini 2.5 Pro TTS.
Multi-speaker dialogue Assign different voices to different speakers and generate a natural-sounding conversation in one pass — no need to stitch separate audio clips together.
Expressive, natural voices The voices carry natural intonation, pacing, and emotional range for lifelike results.
Multi-language support Supports a wide range of languages including Arabic (Egypt), Bangla (Bangladesh), Dutch (Netherlands), English (India), English (United States), French (France), German (Germany), Hindi (India), Indonesian (Indonesia), and more.
Flexible speaker setup Add as many speakers as your script needs, each with their own named voice. Simply write dialogue with speaker labels and the model handles the rest.

Parameters

Parameter	Required	Description
text	Yes	The script or dialogue text. Use “Speaker: line” format for multi-speaker content.
language	Yes	Language and locale for synthesis (e.g., English (United States), French (France)).
speakers	Yes	A list of speaker entries, each with a speaker name and a voice selection.

How to Use

Write your script in the text field using the “Speaker: dialogue” format (e.g., “Rose: Welcome back to Tech Talk!”).
Select the language from the dropdown.
Add speakers — for each speaker in your script, add an entry with the speaker name and choose a voice.
Run — the model generates a single audio file with all speakers voiced naturally.
Download the output audio.

Pricing

$0.04 per 1,000 characters of input text.

Billing Rules

Billed by text length, rounded up to the nearest 1,000 characters
Minimum charge is $0.04 (for texts up to 1,000 characters)

Examples

Text Length	Cost
500 characters	$0.04
1,000 characters	$0.04
2,500 characters	$0.12
5,000 characters	$0.20
10,000 characters	$0.40

Best Use Cases

Podcasts & Talk Shows — Generate multi-host audio content with distinct voices for each speaker.
Audiobooks & Narration — Bring stories to life with different character voices in a single generation.
E-learning & Training — Create engaging instructional audio with conversational dialogue.
Content Localization — Produce voiceovers in multiple languages for global audiences.
High-volume Production — Cost-efficient TTS for large-scale audio content pipelines.

Pro Tips

Use the “Speaker: dialogue” format consistently throughout your script to ensure correct voice assignment.
Make sure each speaker name in the text exactly matches the speaker name in the speakers list.
Keep dialogue natural — the model handles pacing and intonation best with conversational writing.
For long scripts, break content into logical segments to review quality before generating the full piece.
Choose Flash for speed and volume; upgrade to Pro when you need maximum voice quality.

Notes

The number of available voices may vary by language. Experiment with different voice options to find the best fit for your content.
Please ensure your content complies with Google’s usage policies.

Gemini 2.5 Pro Text-to-Speech — Higher-quality multi-speaker TTS at $0.08 per 1,000 characters for premium voice output.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/google/gemini-2.5-flash/text-to-speech" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "language": "English (United States)",
    "speakers": [
        {
            "speaker": "",
            "voice": "Achernar"
        }
    ]
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
text	string	Yes	-	-	Styling instructions on how to synthesize the content in the text field.Less than or equal to 8,000 bytes
language	string	Yes	English (United States)	Arabic (Egypt), Bangla (Bangladesh), Dutch (Netherlands), English (India), English (United States), French (France), German (Germany), Hindi (India), Indonesian (Indonesia), Italian (Italy), Japanese (Japan), Korean (South Korea), Marathi (India), Polish (Poland), Portuguese (Brazil), Romanian (Romania), Russian (Russia), Spanish (Spain), Tamil (India), Telugu (India), Thai (Thailand), Turkish (Turkey), Ukrainian (Ukraine), Vietnamese (Vietnam)	Language spoken in the audio.
speakers	array	Yes	[{"speaker":"","voice":"Achernar"}]	1 ~ 2 items	Array of terminoogies to use for translation

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction, the ID of the prediction to get
data.model	string	Model ID used for the prediction
data.outputs	string	Array of URLs to the generated content (empty when status is not completed).
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Google Gemini 2.5 Flash Image Text To Image Google Gemini 2.5 Pro Text To Speech

Google Gemini 2.5 Flash Text To Speech

Playground

Features

Gemini 2.5 Flash Text-to-Speech

Why Choose This?

Parameters

How to Use

Pricing

Billing Rules

Examples

Best Use Cases

Pro Tips

Notes

Related Models

Authentication

API Endpoints

Submit Task & Query Result

Parameters

Task Submission Parameters

Request Parameters

Response Parameters

Result Request Parameters

Result Response Parameters