Seedance 2.0 | Special Offer ✦ 10% OFF NOW | Ends May 13 (UTC+0)

Inworld Realtime TTS-2

inworld /

Inworld Realtime TTS-2 converts text into low-latency, natural speech with official TTS-2 controls for delivery mode, language, timestamps, text normalization, and audio output settings. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio
Eingabe
If set to true, the function will wait for the result to be generated and uploaded before returning the response. It allows you to get the result directly in the response. This property is only available through the API.

Bereit

Ihre Anfrage kostet $0.035 pro Durchlauf.

Für $1 können Sie dieses Modell ungefähr 28 Mal ausführen.

BeispieleAlle anzeigen

Ähnliche Modelle

README

Inworld Realtime TTS 2

Inworld Realtime TTS 2 converts text into natural-sounding speech with low-latency generation and flexible voice controls. It supports multiple output audio formats and lets you adjust speaking rate and temperature for different delivery styles.

Why Choose This?

  • Low-latency text-to-speech Generate speech quickly for interactive apps, assistants, and real-time voice experiences.

  • Natural voice output Create smooth, human-like speech from plain text with selectable voices.

  • Flexible voice controls Adjust speaking rate and temperature to better match tone, pacing, and delivery style.

  • Multiple output formats Export audio in MP3, LINEAR16, OGG_OPUS, FLAC, or WAV depending on your workflow.

  • Production-ready API Access the model through a realtime-friendly API for apps, agents, games, and voice products.

Parameters

ParameterRequiredDescription
textYesInput text to convert into speech.
voice_idNoVoice selection for the generated speech, such as Julia.
speaking_rateNoControls how fast the voice speaks. Default: 1.
temperatureNoControls variation and expressiveness in the generated speech. Default: 1.
output_formatNoOutput audio format: MP3, LINEAR16, OGG_OPUS, FLAC, or WAV.

How to Use

  1. Enter your text — paste or type the content you want to convert into speech.
  2. Choose a voice — select the voice that best fits your use case.
  3. Adjust speaking rate and temperature (optional) — fine-tune pacing and expressiveness.
  4. Choose output format — select MP3, LINEAR16, OGG_OPUS, FLAC, or WAV.
  5. Submit — generate and download the audio output.

Example Input

Welcome to our product demo. Today we will walk through the key features, explain how the workflow operates, and show how quickly you can integrate voice output into your application.

Pricing

Text LengthCost
1–1000 chars$0.035
1001–2000 chars$0.070
2001–3000 chars$0.105
3001–4000 chars$0.140
4001–5000 chars$0.175

Billing Rules

  • Pricing is based on the length of text.
  • Character count is rounded up to the next 1,000-character block.
  • Each additional started 1,000 characters adds $0.035.
  • voice_id, speaking_rate, temperature, and output_format do not affect pricing.

Best Use Cases

  • Realtime voice agents — Generate spoken responses for assistants, NPCs, and conversational interfaces.
  • Interactive applications — Add live voice output to games, education tools, and customer-facing apps.
  • Accessibility features — Turn written content into audio for more accessible user experiences.
  • Content narration — Create voiceovers for guides, product demos, and short-form content.
  • Prototype voice experiences — Quickly test different voices, pacing, and formats in development workflows.

Pro Tips

  • Keep input text clean and well-punctuated for more natural speech rhythm.
  • Split very long content into smaller sections when you want tighter pacing control.
  • Use speaking_rate to match the use case, such as slower for tutorials and faster for assistants.
  • Adjust temperature when you want more variation in delivery style.
  • Choose MP3 for broad compatibility, and use lossless formats like WAV or FLAC when audio quality matters more.
  • Reuse the same voice and settings across related clips for a more consistent user experience.

Notes

  • text is the only required field.
  • Supported output formats are MP3, LINEAR16, OGG_OPUS, FLAC, and WAV.
  • Pricing depends only on text length.
  • Audio format and voice settings do not change the price.

Related Models

  • Other Inworld speech and voice generation models may be useful when you need different latency, quality, or voice configuration options.
Barrierefreiheit:Diese Website nutzt KI-Modelle von Drittanbietern.