Introducing Inworld 1.5 Mini Text To Speech on WaveSpeedAI

Introducing Inworld 1.5 Mini Text-to-Speech on WaveSpeedAI

Voice is becoming the default interface for AI applications. From conversational agents to interactive games, the ability to turn text into natural-sounding speech — instantly and affordably — is no longer a nice-to-have. It’s a requirement. WaveSpeedAI is excited to announce the availability of Inworld 1.5 Mini, an ultra-fast, ultra-affordable text-to-speech model that delivers natural multilingual speech synthesis at just $0.005 per 1,000 characters.

Built by Inworld AI — the team behind the #1 ranked model on the Artificial Analysis TTS Leaderboard — Inworld 1.5 Mini brings production-grade voice synthesis to developers who need speed and scale without breaking the budget.

What is Inworld 1.5 Mini?

Inworld 1.5 Mini is the lightweight variant of Inworld’s TTS-1.5 family, purpose-built for latency-sensitive and high-volume applications. While its sibling, Inworld 1.5 Max, optimizes for maximum naturalness and expressiveness, Mini prioritizes blazing-fast response times — achieving sub-130ms P90 time-to-first-audio latency, which is 4x faster than previous-generation models.

Despite its compact architecture, Mini doesn’t sacrifice quality. The TTS-1.5 generation delivers 30% greater expressiveness and a 40% reduction in word error rates compared to earlier Inworld models. The result is a model that sounds remarkably natural while responding almost instantaneously — making it ideal for real-time interactive experiences where every millisecond counts.

Key Features

Ultra-Low Latency

Sub-130ms P90 time-to-first-audio — among the fastest TTS models available today
4x faster than prior Inworld generations
Optimized for real-time conversational pipelines and interactive applications

65+ Multilingual Voices Across 15 Languages

Inworld 1.5 Mini ships with a diverse voice library spanning:

English — 25 distinct voices ranging from professional narrators to expressive character voices
Chinese — 4 voices including calm, energetic, and narrative styles
Japanese, Korean — Native-speaking voices with natural intonation
European — French, German, Spanish, Portuguese, Italian, Dutch, Polish, Russian
South Asian & Middle Eastern — Hindi, Hebrew, Arabic

Each voice has its own personality — from Blake’s rich, intimate tone ideal for audiobooks to Dominus’s menacing robotic quality perfect for game villains, to Luna’s calming cadence suited for meditation content.

Fine-Grained Control

Speaking rate adjustment — Speed up for announcements, slow down for dramatic narration
Temperature control — Lower values for consistent, predictable output; higher values for more dynamic, expressive delivery
Simple parameter set — Just text, voice, rate, and temperature. No complex configuration required.

Unbeatable Pricing

At $0.005 per 1,000 characters, Inworld 1.5 Mini is one of the most affordable TTS solutions on the market — up to 25x more affordable than competing models at comparable quality levels. Character count is rounded up to the nearest 1,000, with transparent, predictable billing.

Characters	Cost
Up to 1,000	$0.005
Up to 5,000	$0.025
Up to 10,000	$0.050

Real-World Use Cases

Conversational AI and Voice Agents

Inworld 1.5 Mini’s sub-130ms latency makes it the natural choice for voice-enabled chatbots, customer service agents, and virtual assistants. Users experience fluid, natural conversations without the awkward silences that plague slower TTS systems. The multilingual voice library means you can deploy globally from day one.

Gaming and Interactive Entertainment

Power NPC dialogue, in-game narration, and character voices with instant, expressive speech synthesis. With voices like Hades (commanding and gruff), Pixie (high-pitched and playful), and Edward (fast-talking and streetwise), game developers have a ready-made cast of characters at their disposal — no voice actors required for prototyping or indie production.

High-Volume Content Production

Need to generate thousands of audio clips for an e-learning platform, automated news service, or accessibility layer? Mini’s combination of low cost and fast processing makes batch audio generation economically viable at scale. Use it for drafting and iteration, then switch to Inworld 1.5 Max for final production when maximum quality matters.

Multilingual Content Delivery

Create audio content in 15 languages from a single API endpoint. Whether you’re localizing an app, producing multilingual podcasts, or building a translation pipeline, Mini handles it all with native-quality pronunciation and intonation per language.

Accessibility

Convert written content — articles, documentation, notifications — into spoken audio affordably, making your products accessible to visually impaired users or anyone who prefers listening over reading.

Getting Started on WaveSpeedAI

Using Inworld 1.5 Mini on WaveSpeedAI takes just a few lines of code:

import wavespeed

output = wavespeed.run(
    "inworld/inworld-1.5-mini/text-to-speech",
    {
        "text": "Welcome to WaveSpeedAI. The fastest way to bring AI to production.",
        "voice_id": "Olivia",
        "speaking_rate": 1,
        "temperature": 1,
    },
)

print(output["outputs"][0])  # Audio URL

Step-by-Step

Prepare your text — Type or paste the content you want converted to speech
Choose a voice — Select from 65+ voice presets across 15 languages (e.g., Ashley for warm and natural, Carter for radio announcer energy, Asuka for friendly Japanese)
Adjust delivery — Set speaking_rate for pacing and temperature for expressiveness
Generate — Submit your request and receive a downloadable audio file

Pro Tips

Keep speaking_rate around 1.0 for natural pacing — go lower for dramatic reads, higher for quick announcements
Lower temperature produces more consistent, predictable output — ideal for automated systems
Break long texts into logical paragraphs for better pacing and natural pauses
Always match voice language to your text language for the best pronunciation
Start with Mini for rapid prototyping, then upgrade to Inworld 1.5 Max for final production audio

Why WaveSpeedAI?

Running Inworld 1.5 Mini through WaveSpeedAI gives you more than just model access:

No Cold Starts — Requests are served immediately with zero initialization delay
Best Performance — Optimized infrastructure delivers consistently fast response times
Affordable Pricing — Transparent pay-per-use billing with no hidden fees
Simple REST API — Integrate into any application with a straightforward inference endpoint
Production-Ready — Built for reliability at scale with high availability

Conclusion

Inworld 1.5 Mini hits the sweet spot that developers have been searching for: a text-to-speech model that’s fast enough for real-time applications, affordable enough for high-volume production, and versatile enough to cover 15 languages with 65+ expressive voices. Backed by the #1 ranked TTS technology on the Artificial Analysis Leaderboard and delivered through WaveSpeedAI’s zero-cold-start infrastructure, it’s the most practical path to adding natural voice to your applications.

Whether you’re building voice agents, generating game dialogue, producing multilingual content, or making your products more accessible, Inworld 1.5 Mini on WaveSpeedAI delivers the speed, quality, and affordability to make it happen.

Try Inworld 1.5 Mini on WaveSpeedAI today and start building with production-grade voice synthesis at a fraction of the cost.