Introducing WaveSpeedAI Song Generation on WaveSpeedAI

Introducing SongGeneration (LeVo): Transform Your Lyrics Into Professional Songs with AI

The world of AI-powered music creation has reached a new milestone. WaveSpeedAI is excited to announce the availability of SongGeneration (LeVo), a groundbreaking text-to-song model developed by Tencent AI Lab that generates complete, high-quality songs from your lyrics. This open-source model represents a significant leap forward in AI music generation, delivering results that rival commercial platforms like Suno 4.5.

What is SongGeneration (LeVo)?

SongGeneration is an LM-based framework for music generation that combines a language model (LeLM) with an advanced music codec to produce full-length songs with vocals. Unlike simpler text-to-audio models that generate instrumental music or short clips, SongGeneration creates complete songs—vocals, accompaniment, and professional-quality production—all from structured lyrics input.

The model can generate songs up to 4 minutes and 30 seconds in length, supporting multiple languages including English, Chinese, Spanish, and Japanese. What sets it apart is its flexibility: you can output combined vocals and accompaniment, pure instrumental music, isolated a cappella vocals, or fully separated tracks for professional mixing.

Key Features

Full-Length Song Generation: Create complete songs up to 4.5 minutes, not just 30-second clips
Structured Lyrics Support: Use intuitive section markers like [verse], [chorus], and [bridge] to control song structure
Flexible Style Control: Guide the output with text descriptions for gender, timbre, genre, emotion, instruments, and tempo
Audio Prompting: Upload a reference audio sample (first 10 seconds) to influence the generated style
Multiple Output Modes: Get combined mix, pure music, a cappella vocals, or separated tracks
Multilingual Capabilities: Generate songs in English, Chinese, Spanish, Japanese, and more
Professional Quality Metrics: Achieves 5.1% Phoneme Error Rate on benchmarks with musicality scores of 3.94/5

Use Cases

For Musicians and Producers

Quickly prototype song ideas by writing lyrics and hearing them performed. Test different genres, tempos, and arrangements before committing to full production. Use the separated track output to extract AI-generated melodies or harmonies for your own arrangements.

For Content Creators

Generate original background music and jingles for videos, podcasts, and social media content. Create custom theme songs for your brand or channel without expensive studio sessions or licensing fees.

For Game and App Developers

Produce dynamic, original soundtracks tailored to specific moods and scenes. Generate multiple variations quickly to find the perfect fit for your project’s atmosphere.

For Songwriters

Hear your lyrics come to life instantly to evaluate melody and rhythm. Experiment with different styles and arrangements to discover new creative directions.

For Educators and Researchers

Create custom educational songs or study the intersection of AI and music composition. Analyze how different lyric structures and style prompts affect generated output.

How to Format Your Input

Lyrics Structure

Your lyrics should follow this format:

[intro-short]

[verse]
Streetlights flicker in the night
I wander through familiar corners
Memories rush in like a tide

[chorus]
The warmth of memories still remains
But you are gone
My heart was filled with love

[outro-short]

Each section starts with a structure label in brackets. Labels like [intro-short], [inst-medium], and [outro-long] are instrumental only—no lyrics needed. Labels like [verse], [chorus], and [bridge] require lyric text.

Style Description

Control the musical output with a natural language description:

female, dark, pop, sad, piano and drums, the bpm is 125

You can specify any combination of gender, timbre, genre, emotion, instruments, and tempo. The model supports open vocabulary, though predefined tags deliver more consistent results.

Using Reference Audio

For even more precise style matching, upload a reference audio clip. The model uses the first 10 seconds to learn the genre, instrumentation, rhythm, and vocal style. Pro tip: using a song’s chorus as reference typically produces the best results.

Getting Started on WaveSpeedAI

WaveSpeedAI makes it easy to start generating songs immediately:

Visit the SongGeneration model page
Enter your structured lyrics in the input field
Add an optional style description or upload reference audio
Click generate and receive your complete song

With WaveSpeedAI’s infrastructure, you benefit from:

Instant availability: No cold starts mean your generation begins immediately
Fast inference: Optimized infrastructure delivers results quickly
Affordable pricing: Pay only for what you generate with transparent per-request pricing
Simple REST API: Integrate song generation directly into your applications and workflows

Technical Specifications

Feature	Specification
Maximum Song Length	4 minutes 30 seconds
Supported Languages	English, Chinese, Spanish, Japanese
Output Formats	Combined mix, instrumental, vocals, separated tracks
Input Methods	Structured lyrics + text description or reference audio

Conclusion

SongGeneration (LeVo) represents a significant advancement in AI music generation, bringing professional-quality song creation within reach of anyone with an idea and some lyrics. Whether you’re a musician exploring new sounds, a content creator needing original music, or a developer building the next generation of creative tools, this model opens new possibilities.

The combination of structured lyric input, flexible style control, and multi-track output capabilities makes SongGeneration one of the most versatile text-to-song models available today. And with WaveSpeedAI’s fast, reliable inference infrastructure, you can start creating in seconds.

Ready to hear your lyrics come to life? Try SongGeneration on WaveSpeedAI today and experience the future of AI-powered music creation.