Introducing Inworld 1.5 Mini Text To Speech on WaveSpeedAI
Introducing Inworld 1.5 Mini Text-to-Speech on WaveSpeedAI
Voice is becoming the default interface for AI applications. From conversational agents to interactive games, the ability to turn text into natural-sounding speech — instantly and affordably — is no longer a nice-to-have. It’s a requirement. WaveSpeedAI is excited to announce the availability of Inworld 1.5 Mini, an ultra-fast, ultra-affordable text-to-speech model that delivers natural multilingual speech synthesis at just $0.005 per 1,000 characters.
Built by Inworld AI — the team behind the #1 ranked model on the Artificial Analysis TTS Leaderboard — Inworld 1.5 Mini brings production-grade voice synthesis to developers who need speed and scale without breaking the budget.
What is Inworld 1.5 Mini?
Inworld 1.5 Mini is the lightweight variant of Inworld’s TTS-1.5 family, purpose-built for latency-sensitive and high-volume applications. While its sibling, Inworld 1.5 Max, optimizes for maximum naturalness and expressiveness, Mini prioritizes blazing-fast response times — achieving sub-130ms P90 time-to-first-audio latency, which is 4x faster than previous-generation models.
Despite its compact architecture, Mini doesn’t sacrifice quality. The TTS-1.5 generation delivers 30% greater expressiveness and a 40% reduction in word error rates compared to earlier Inworld models. The result is a model that sounds remarkably natural while responding almost instantaneously — making it ideal for real-time interactive experiences where every millisecond counts.
Key Features
Ultra-Low Latency
- Sub-130ms P90 time-to-first-audio — among the fastest TTS models available today
- 4x faster than prior Inworld generations
- Optimized for real-time conversational pipelines and interactive applications
65+ Multilingual Voices Across 15 Languages
Inworld 1.5 Mini ships with a diverse voice library spanning:
- English — 25 distinct voices ranging from professional narrators to expressive character voices
- Chinese — 4 voices including calm, energetic, and narrative styles
- Japanese, Korean — Native-speaking voices with natural intonation
- European — French, German, Spanish, Portuguese, Italian, Dutch, Polish, Russian
- South Asian & Middle Eastern — Hindi, Hebrew, Arabic
Each voice has its own personality — from Blake’s rich, intimate tone ideal for audiobooks to Dominus’s menacing robotic quality perfect for game villains, to Luna’s calming cadence suited for meditation content.
Fine-Grained Control
- Speaking rate adjustment — Speed up for announcements, slow down for dramatic narration
- Temperature control — Lower values for consistent, predictable output; higher values for more dynamic, expressive delivery
- Simple parameter set — Just text, voice, rate, and temperature. No complex configuration required.
Unbeatable Pricing
At $0.005 per 1,000 characters, Inworld 1.5 Mini is one of the most affordable TTS solutions on the market — up to 25x more affordable than competing models at comparable quality levels. Character count is rounded up to the nearest 1,000, with transparent, predictable billing.
| Characters | Cost |
|---|---|
| Up to 1,000 | $0.005 |
| Up to 5,000 | $0.025 |
| Up to 10,000 | $0.050 |
Real-World Use Cases
Conversational AI and Voice Agents
Inworld 1.5 Mini’s sub-130ms latency makes it the natural choice for voice-enabled chatbots, customer service agents, and virtual assistants. Users experience fluid, natural conversations without the awkward silences that plague slower TTS systems. The multilingual voice library means you can deploy globally from day one.
Gaming and Interactive Entertainment
Power NPC dialogue, in-game narration, and character voices with instant, expressive speech synthesis. With voices like Hades (commanding and gruff), Pixie (high-pitched and playful), and Edward (fast-talking and streetwise), game developers have a ready-made cast of characters at their disposal — no voice actors required for prototyping or indie production.
High-Volume Content Production
Need to generate thousands of audio clips for an e-learning platform, automated news service, or accessibility layer? Mini’s combination of low cost and fast processing makes batch audio generation economically viable at scale. Use it for drafting and iteration, then switch to Inworld 1.5 Max for final production when maximum quality matters.
Multilingual Content Delivery
Create audio content in 15 languages from a single API endpoint. Whether you’re localizing an app, producing multilingual podcasts, or building a translation pipeline, Mini handles it all with native-quality pronunciation and intonation per language.
Accessibility
Convert written content — articles, documentation, notifications — into spoken audio affordably, making your products accessible to visually impaired users or anyone who prefers listening over reading.
Getting Started on WaveSpeedAI
Using Inworld 1.5 Mini on WaveSpeedAI takes just a few lines of code:
import wavespeed
output = wavespeed.run(
"inworld/inworld-1.5-mini/text-to-speech",
{
"text": "Welcome to WaveSpeedAI. The fastest way to bring AI to production.",
"voice_id": "Olivia",
"speaking_rate": 1,
"temperature": 1,
},
)
print(output["outputs"][0]) # Audio URL
Step-by-Step
- Prepare your text — Type or paste the content you want converted to speech
- Choose a voice — Select from 65+ voice presets across 15 languages (e.g.,
Ashleyfor warm and natural,Carterfor radio announcer energy,Asukafor friendly Japanese) - Adjust delivery — Set
speaking_ratefor pacing andtemperaturefor expressiveness - Generate — Submit your request and receive a downloadable audio file
Pro Tips
- Keep
speaking_ratearound 1.0 for natural pacing — go lower for dramatic reads, higher for quick announcements - Lower
temperatureproduces more consistent, predictable output — ideal for automated systems - Break long texts into logical paragraphs for better pacing and natural pauses
- Always match voice language to your text language for the best pronunciation
- Start with Mini for rapid prototyping, then upgrade to Inworld 1.5 Max for final production audio
Why WaveSpeedAI?
Running Inworld 1.5 Mini through WaveSpeedAI gives you more than just model access:
- No Cold Starts — Requests are served immediately with zero initialization delay
- Best Performance — Optimized infrastructure delivers consistently fast response times
- Affordable Pricing — Transparent pay-per-use billing with no hidden fees
- Simple REST API — Integrate into any application with a straightforward inference endpoint
- Production-Ready — Built for reliability at scale with high availability
Conclusion
Inworld 1.5 Mini hits the sweet spot that developers have been searching for: a text-to-speech model that’s fast enough for real-time applications, affordable enough for high-volume production, and versatile enough to cover 15 languages with 65+ expressive voices. Backed by the #1 ranked TTS technology on the Artificial Analysis Leaderboard and delivered through WaveSpeedAI’s zero-cold-start infrastructure, it’s the most practical path to adding natural voice to your applications.
Whether you’re building voice agents, generating game dialogue, producing multilingual content, or making your products more accessible, Inworld 1.5 Mini on WaveSpeedAI delivers the speed, quality, and affordability to make it happen.
Try Inworld 1.5 Mini on WaveSpeedAI today and start building with production-grade voice synthesis at a fraction of the cost.


