Browse ModelsInworldInworld 1.5 Max Text To Speech

Inworld 1.5 Max Text To Speech

Inworld 1.5 Max Text To Speech

Playground

Try it on WavespeedAI!

Inworld 1.5 Max delivers premium text-to-speech synthesis with 56+ multilingual voices, adjustable speaking rate, and high-fidelity natural-sounding audio output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

Inworld 1.5 Max Text-to-Speech

Inworld 1.5 Max is a high-quality text-to-speech model that converts written text into natural, expressive speech. Choose from a variety of voice presets, fine-tune speaking rate and expressiveness with simple controls, and generate professional-grade audio in seconds — ideal for IVR systems, voiceovers, content creation, and accessibility.


Why Choose This?

  • Natural-sounding voices Multiple voice presets with realistic intonation, pacing, and emotion for lifelike speech output.

  • Voice selection Choose from a library of distinct voice identities to match your brand, character, or use case.

  • Speaking rate control Adjust the speed of speech to suit narration, dialogue, announcements, or any delivery style.

  • Temperature control Fine-tune expressiveness — lower values for consistent, predictable delivery; higher values for more dynamic, varied speech.

  • Ultra-low cost Just $0.01 per 1,000 characters — affordable even at scale.


Parameters

ParameterRequiredDescription
textYesThe text content to convert to speech
voice_idNoVoice preset to use (e.g., Elizabeth)
speaking_rateNoSpeed of speech (default: 1)
temperatureNoExpressiveness level (default: 1)

How to Use

  1. Enter your text — type or paste the content you want converted to speech.
  2. Select a voice — choose a voice preset from the voice_id dropdown.
  3. Adjust speaking rate — slide to control how fast or slow the speech is delivered.
  4. Adjust temperature — slide to control the expressiveness and variation in delivery.
  5. Run — submit and download the generated audio.

Pricing

CharactersCost
Up to 1,000$0.01
Up to 2,000$0.02
Up to 5,000$0.05
Up to 10,000$0.10

Billing Rules

  • Rate: $0.01 per 1,000 characters
  • Rounding: character count is rounded up to the next 1,000

Best Use Cases

  • IVR & Phone Systems — Generate professional call menus, hold messages, and automated responses.
  • Video Voiceovers — Add narration to marketing videos, tutorials, and presentations.
  • Content Creation — Convert blog posts, articles, or scripts into audio for podcasts and social media.
  • Accessibility — Provide audio versions of written content for visually impaired users.
  • Game & App Dialogue — Create character voices for interactive experiences and virtual assistants.

Pro Tips

  • Keep speaking_rate around 1 for natural-sounding narration; lower for dramatic reads, higher for fast announcements.
  • Use lower temperature for consistent, predictable voiceovers (e.g., IVR); higher temperature for more expressive character dialogue.
  • Break long texts into logical paragraphs for better pacing and natural pauses.
  • Test different voice_id options to find the best match for your brand or character.

Notes

  • Text is the only required field.
  • Billing is based on character count, rounded up to the nearest 1,000.
  • Very long texts may take slightly longer to process.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/inworld/1.5-max/text-to-speech" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "voice_id": "Alex",
    "speaking_rate": 1,
    "temperature": 1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
textstringYes--Styling instructions on how to synthesize the content in the text field.
voice_idstringNoAlexAlex, Ashley, Craig, Deborah, Dennis, Edward, Elizabeth, Hades, Julia, Pixie, Mark, Olivia, Priya, Ronald, Sarah, Shaun, Theodore, Timothy, Wendy, Dominus, Hana, Clive, Carter, Blake, Luna, Yichen, Xiaoyin, Xinyi, Jing, Erik, Katrien, Lennart, Lore, Alain, Hélène, Mathieu, Étienne, Johanna, Josef, Gianni, Orietta, Asuka, Satoshi, Hyunwoo, Minji, Seojun, Yoona, Szymon, Wojciech, Heitor, Maitê, Diego, Lupita, Miguel, Rafael, Svetlana, Elena, Dmitry, Nikolai, Riya, Manoj, Yael, Oren, Nour, OmarThe voice to use for speech generation.
speaking_ratenumberNo10.5 ~ 1.5The speed of speaking.
temperaturenumberNo10.7 ~ 1.5The temperature to use for the generation. A higher value means more randomness in the output.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

ParameterTypeRequiredDefaultDescription
idstringYes-Task ID

Result Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
dataobjectThe prediction data object containing all details
data.idstringUnique identifier for the prediction, the ID of the prediction to get
data.modelstringModel ID used for the prediction
data.outputsstringArray of URLs to the generated content (empty when status is not completed).
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds
© 2025 WaveSpeedAI. All rights reserved.