Introducing WaveSpeedAI Heartmula Generate Music on WaveSpeedAI

HeartMuLa Is Now Available on WaveSpeedAI: Turn Your Lyrics Into Complete Songs With AI

Creating original music has long been one of the most time-intensive creative pursuits—until now. WaveSpeedAI is thrilled to announce the availability of HeartMuLa, a state-of-the-art open-source music foundation model that generates high-quality, complete songs from your lyrics and style tags. Whether you’re a songwriter prototyping ideas, a content creator looking for a custom soundtrack, or a developer building the next music-powered application, HeartMuLa puts professional-grade music production into a simple API call.

What Is HeartMuLa?

HeartMuLa is a family of open-source music foundation models built on a sophisticated four-component architecture: HeartCLAP for audio-text alignment, HeartTranscriptor for lyric recognition, HeartCodec for high-fidelity music tokenization, and the HeartMuLa language model itself for song generation. Together, these components produce complete songs—vocals, melodies, harmonies, and full instrumental arrangements—from nothing more than structured lyrics and a handful of style tags.

What makes HeartMuLa remarkable is its lyric clarity. In benchmark tests, HeartMuLa achieves the lowest Phoneme Error Rate (PER) across every language tested, outperforming top commercial models including Suno v5 and MiniMax Music 2.0. In English, HeartMuLa reaches a PER of just 0.09, while in Chinese it achieves 0.12—meaning every word you write comes through crystal-clear in the final song. The model has been further refined with Direct Preference Optimization (DPO), a reinforcement learning technique that ensures precise control over styles, tags, and overall musical quality.

For the first time, a commercial-grade music generation system has been reproduced at academic scale and released as open source under the Apache 2.0 license—and now it’s available on WaveSpeedAI with zero setup required.

Key Features

Complete Song Generation: Produces full songs with vocals, instrumentals, intros, bridges, and outros—not just loops or short clips
Multilingual Lyrics: Supports English, Chinese, Japanese, Korean, and Spanish, making it ideal for creators targeting global audiences
Structured Song Composition: Use section markers like [Verse], [Chorus], [Bridge], [intro-short], and [outro-medium] to precisely control your song’s arrangement and flow
Flexible Style Control: Define genre, mood, tempo, instruments, and vocal characteristics through simple comma-separated tags like "r&b, smooth, male vocals, soulful, 85bpm"
Instrumental Sections: Add intros, outros, and instrumental breaks with configurable duration markers—no lyrics required for these sections
Industry-Leading Lyric Clarity: Lowest phoneme error rate across all tested languages, ensuring your lyrics are sung exactly as written

Real-World Use Cases

Original Music Creation

Songwriters and musicians can bring their lyrics to life instantly. Write your verses and choruses, choose a style, and hear a fully produced version of your song in seconds. It’s the fastest path from idea to demo.

Content Soundtracks

Video creators, podcasters, and social media producers can generate custom background music that fits their content perfectly. Instead of sifting through generic royalty-free libraries, create something unique for every project.

Multilingual Content Production

Brands and creators serving international audiences can produce songs in five languages from a single model. Launch a marketing campaign with a Japanese pop track, a Spanish ballad, and an English anthem—all generated through the same API.

Demo Production & Songwriting Assistance

Professional songwriters can use HeartMuLa as a rapid prototyping tool. Test how lyrics sound set to different genres and tempos before committing to expensive studio sessions. Experiment with arrangements by rearranging section markers and regenerating in seconds.

Game & App Development

Game developers can create original theme songs, menu music, and in-game soundtracks with vocals tailored to their game’s narrative. App developers can integrate music generation directly into their products through the WaveSpeedAI API.

Getting Started on WaveSpeedAI

Generating music with HeartMuLa on WaveSpeedAI is simple. All you need are lyrics—everything else is optional.

Using the API

import wavespeed

output = wavespeed.run(
    "wavespeed-ai/heartmula/generate-music",
    {
        "lyrics": """[intro-short]
[Verse]
Your voice like velvet, I'm never alone
The way you say my name, it pulls me in
A love like ours is more than skin
[Chorus]
Stay with me until the morning light
Hold me close and never let me go
[outro-short]""",
        "tags": "r&b, smooth, male vocals, soulful, slow jam, romantic, 85bpm"
    },
)

print(output["outputs"][0])

Crafting Your Lyrics

Structure your lyrics with section markers for the best results:

[Verse], [Chorus], [Bridge] — Vocal sections that require lyrics
[intro-short], [intro-medium] — Instrumental intros (0–10s or 10–20s)
[inst-short], [inst-medium] — Instrumental breaks between sections
[outro-short], [outro-medium] — Instrumental endings

Defining Your Style

Combine tags to describe exactly the sound you want:

"female, bright, pop, happy, piano, 130bpm" — Upbeat pop anthem
"male, dark, rock, guitar, drums, energetic" — Driving rock track
"piano, happy, wedding, synthesizer, romantic" — Romantic wedding song
"jazz, smooth, saxophone, soft, 90bpm" — Late-night jazz feel

Pro Tips:

Keep lyrics well-structured with clear section markers for the best arrangement quality
Combine multiple style tags for more specific results—genre, mood, instruments, tempo, and vocal characteristics all work together
Use [inst-short] or [inst-medium] between vocal sections to give your song breathing room
Set a specific seed value to reproduce identical results when you find a generation you love

Why Choose WaveSpeedAI?

Running HeartMuLa on WaveSpeedAI gives you the best of both worlds—an open-source model with commercial-grade infrastructure:

No Cold Starts: Your music generation begins immediately, with no waiting for instances to spin up
Fast Inference: Optimized infrastructure delivers your complete songs quickly so you can iterate and experiment freely
Affordable Pricing: Generate complete songs for just $0.10 per song—a fraction of what subscription-based music generation platforms charge
Simple REST API: Integrate AI music generation into your applications with a straightforward API that requires no ML expertise or GPU management

Start Creating Your Music Today

HeartMuLa represents a milestone in AI music generation: an open-source model that rivals commercial offerings in quality, surpasses them in lyric clarity, and supports true multilingual song creation. Combined with WaveSpeedAI’s fast, reliable infrastructure, it’s the most accessible way to turn your words into music.

Whether you’re scoring a film, prototyping a hit, creating content soundtracks, or building a music-powered application, HeartMuLa on WaveSpeedAI delivers professional results at a price that opens up creative possibilities for everyone.

Ready to hear your lyrics come to life? Try HeartMuLa on WaveSpeedAI today and start generating complete songs from your words.