SongGeneration

SongGeneration (LeVo) is an open-source text-to-song model developed by Tencent AI Lab that generates high-quality songs with lyrics. It aligns with cutting-edge commercial music generation models like Suno 4.5. Provide lyrics, and optionally an audio or text prompt, to generate a custom song.

Usage

Provide a lyrics, and optionally an audio or text prompt, to generate a custom song.

Lyrics format

Lyrics need to be in the following format:

[structure tag]
lyrics
[structure tag]
lyrics

One paragraph represents one segments, starting with a structure tag and ending with a blank line
One line represents one sentence, punctuation is not recommended inside the sentence
The following segments should not contain lyrics: [intro-short], [intro-medium], [inst-short], [inst-medium], [outro-short], [outro-medium]
The following segments require lyrics: [verse], [chorus], [bridge]

An example of Lyrics is as follows:

[intro-short]

[verse]
Streetlights flicker in the night
I wander through familiar corners
Memories rush in like a tide
Your smile so vivid and bright
Etched in my heart, it won’t fade
All those moments once so sweet
Now I’m left with only memories

[verse]
My phone screen lights up
A message from you appears
Just a few simple words
Yet they bring me to tears
The warmth of your embrace
Now feels so far away
How I wish to turn back time
And have you by my side again

[chorus]
The warmth of memories still remains
But you are gone
My heart was filled with love
Now pierced by longing
The rhythm of music plays
But my heart is drifting
In days without you
How can I keep moving on

[outro-short]

Description

Description could be used to describe the genre of the music, as well as the timbre.

female, dark, pop, sad, piano and drums, the bpm is 125

Prompt Audio

Prompt Audio could be used to guide the model to learn the genre in the music audio.

Priority

Priority: prompt_audio > description > genre

Input Guide

🎵 Lyrics Input Format

The lyric field defines the lyrics and structure of the song. It consists of multiple musical section, each starting with a structure label. The model uses these labels to guide the musical and lyrical progression of the generated song.

📌 Structure Labels

The following segments should not contain lyrics (they are purely instrumental):
- [intro-short], [intro-medium], [inst-short], [inst-medium], [outro-short], [outro-medium]
- short indicates a segment of approximately 0–10 seconds
- medium indicates a segment of approximately 10–20 seconds
- We find that [inst] label is less stable, so we recommend that you do not use it.
The following segments require lyrics:
- [verse], [chorus], [bridge]

Current supported segments are:

[verse]
[chorus]
[bridge]
[intro-short]
[intro-medium]
[intro-long]
[outro-short]
[outro-medium]
[outro-long]
[inst-short]
[inst-medium]
[inst-long]
[silence]

🧾 Lyrics Formatting Rules

Each section is separated by an empty line
Within lyrical segments ([verse], [chorus], [bridge]), lyrics must be written in complete sentences, and each sentence is one line.

📝 Description Input Format

The description field allows you to control various musical attributes of the generated song. It can describe up to six musical dimensions: Gender (e.g., male, female), Timbre (e.g., dark, bright, soft), Genre (e.g., pop, jazz, rock), Emotion (e.g., sad, energetic, romantic), Instrument (e.g., piano, drums, guitar), BPM (e.g., the bpm is 120).

All six dimensions are optional — you can specify any subset of them.
The order of dimensions is flexible.
Use commas (,) to separate different attributes.
Although the model supports open vocabulary, we recommend using predefined tags for more stable and reliable performance. A list of commonly supported tags for each dimension is available in sample descriptions.

Here are a few valid descriptions inputs:

- female, dark, pop, sad, piano and drums, the bpm is 125.
- male, piano, jazz.
- male, dark, the bpm is 110.

🎧Prompt Audio Usage Notes

The input audio file can be longer than 10 seconds, but only the first 10 seconds will be used.
For best musicality and structure, it is recommended to use the chorus section of a song as the prompt audio.
You can use this field to influence genre, instrumentation, rhythm, and voice.

wavespeed-ai/song-generation

SongGeneration (LeVo) is an open-source text-to-song model that generates high-quality songs with lyrics. It aligns with cutting-edge commercial music generation models like Suno 4.5. Provide lyrics, and optionally an audio or text prompt, to generate a custom song.

ExamplesView all

README