Home/Explore/Speech Generation/elevenlabs/flash-v2
text-to-audio

text-to-audio

ElevenLabs Flash V2 | Text To Speech Model, REST API Endpoint | WaveSpeedAI

elevenlabs/flash-v2

ElevenLabs Flash V2 is a Text-to-Speech model that converts text into spoken audio using the ElevenLabs Flash V2 engine. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

This parameter supports English text normalization, which improves performance in number-reading scenarios.

Idle

Your request will cost $0.05 per run.

For $1 you can run this model approximately 20 times.

ExamplesView all

README

ElevenLabs — Flash V2 Text-to-Speech

Flash V2 turns written text into natural-sounding speech with crisp pronunciation, smooth pacing, and expressive tone—ideal for voiceovers, narration, tutorials, podcasts, and digital content. It supports a rich library of multi-lingual voices and low-latency generation for fast workflows. See the list here.

Key Features

  • Natural prosody with clear, humanlike articulation
  • Multilingual support with strong English numeral/date reading
  • Fine control via similarity and stability sliders
  • Speaker Boost to enhance English number and unit delivery

Pricing

  • $0.05 per 1,000 characters
  • If the input length is less than 1000 characters, it will be counted as 1000 characters to pay.

How to Use

  1. Enter your script in the text field.
  2. Choose a voice_id (for example: Gigi, Callum, Alice; see the voice list for more).
  3. Optional controls • similarity: 0–1 (higher = closer to the base voice timbre) • stability: 0–1 (higher = more consistent delivery) • use_speaker_boost: improves English number and unit reading
  4. Run to synthesize and preview your audio.

Notes

  • For best prosody, keep sentences clear and use punctuation; split very long text into smaller chunks.
  • Ensure the voice_id is valid; use the official list linked above.
  • Speaker Boost is especially helpful for finance, time, and measurement scripts.