text-to-audio
Idle
Your request will cost $0.05 per run.
For $1 you can run this model approximately 20 times.
SongGeneration (LeVo) is an open-source text-to-song model developed by Tencent AI Lab that generates high-quality songs with lyrics. It aligns with cutting-edge commercial music generation models like Suno 4.5. Provide lyrics, and optionally an audio or text prompt, to generate a custom song.
Provide a lyrics, and optionally an audio or text prompt, to generate a custom song.
Lyrics need to be in the following format:
[structure tag]
lyrics
[structure tag]
lyrics
An example of Lyrics is as follows:
[intro-short]
[verse]
Streetlights flicker in the night
I wander through familiar corners
Memories rush in like a tide
Your smile so vivid and bright
Etched in my heart, it won’t fade
All those moments once so sweet
Now I’m left with only memories
[verse]
My phone screen lights up
A message from you appears
Just a few simple words
Yet they bring me to tears
The warmth of your embrace
Now feels so far away
How I wish to turn back time
And have you by my side again
[chorus]
The warmth of memories still remains
But you are gone
My heart was filled with love
Now pierced by longing
The rhythm of music plays
But my heart is drifting
In days without you
How can I keep moving on
[outro-short]
Description could be used to describe the genre of the music, as well as the timbre.
female, dark, pop, sad, piano and drums, the bpm is 125
Prompt Audio could be used to guide the model to learn the genre in the music audio.
Priority: prompt_audio > description > genre
The lyric
field defines the lyrics and structure of the song. It consists of multiple musical section, each starting with a structure label. The model uses these labels to guide the musical and lyrical progression of the generated song.
The following segments should not contain lyrics (they are purely instrumental):
[intro-short]
, [intro-medium]
, [inst-short]
, [inst-medium]
, [outro-short]
, [outro-medium]
short
indicates a segment of approximately 0–10 secondsmedium
indicates a segment of approximately 10–20 seconds- We find that [inst] label is less stable, so we recommend that you do not use it.
The following segments require lyrics:
[verse]
, [chorus]
, [bridge]
Current supported segments are:
[verse]
[chorus]
[bridge]
[intro-short]
[intro-medium]
[intro-long]
[outro-short]
[outro-medium]
[outro-long]
[inst-short]
[inst-medium]
[inst-long]
[silence]
Each section is separated by an empty line
Within lyrical segments ([verse]
, [chorus]
, [bridge]
), lyrics must be written in complete sentences, and each sentence is one line.
The description
field allows you to control various musical attributes of the generated song. It can describe up to six musical dimensions: Gender (e.g., male, female), Timbre (e.g., dark, bright, soft), Genre (e.g., pop, jazz, rock), Emotion (e.g., sad, energetic, romantic), Instrument (e.g., piano, drums, guitar), BPM (e.g., the bpm is 120).
All six dimensions are optional — you can specify any subset of them.
The order of dimensions is flexible.
Use commas (,
) to separate different attributes.
Although the model supports open vocabulary, we recommend using predefined tags for more stable and reliable performance. A list of commonly supported tags for each dimension is available in sample descriptions.
Here are a few valid descriptions
inputs:
- female, dark, pop, sad, piano and drums, the bpm is 125.
- male, piano, jazz.
- male, dark, the bpm is 110.