Song Generation
Playground
Try it on WavespeedAI!SongGeneration (LeVo) is an open-source text-to-song model that generates high-quality songs with lyrics. It aligns with cutting-edge commercial music generation models like Suno 4.5. Provide lyrics, and optionally an audio or text prompt, to generate a custom song.
Features
SongGeneration
SongGeneration (LeVo) is an open-source text-to-song model developed by Tencent AI Lab that generates high-quality songs with lyrics. It aligns with cutting-edge commercial music generation models like Suno 4.5. Provide lyrics, and optionally an audio or text prompt, to generate a custom song.
Usage
Provide a lyrics, and optionally an audio or text prompt, to generate a custom song.
Lyrics format
Lyrics need to be in the following format:
[structure tag]
lyrics
[structure tag]
lyrics
- One paragraph represents one segments, starting with a structure tag and ending with a blank line
- One line represents one sentence, punctuation is not recommended inside the sentence
- The following segments should not contain lyrics: [intro-short], [intro-medium], [inst-short], [inst-medium], [outro-short], [outro-medium]
- The following segments require lyrics: [verse], [chorus], [bridge]
An example of Lyrics is as follows:
[intro-short]
[verse]
Streetlights flicker in the night
I wander through familiar corners
Memories rush in like a tide
Your smile so vivid and bright
Etched in my heart, it won’t fade
All those moments once so sweet
Now I’m left with only memories
[verse]
My phone screen lights up
A message from you appears
Just a few simple words
Yet they bring me to tears
The warmth of your embrace
Now feels so far away
How I wish to turn back time
And have you by my side again
[chorus]
The warmth of memories still remains
But you are gone
My heart was filled with love
Now pierced by longing
The rhythm of music plays
But my heart is drifting
In days without you
How can I keep moving on
[outro-short]
Description
Description could be used to describe the genre of the music, as well as the timbre.
female, dark, pop, sad, piano and drums, the bpm is 125
Prompt Audio
Prompt Audio could be used to guide the model to learn the genre in the music audio.
Priority
Priority: prompt_audio > description > genre
Input Guide
🎵 Lyrics Input Format
The lyric
field defines the lyrics and structure of the song. It consists of multiple musical section, each starting with a structure label. The model uses these labels to guide the musical and lyrical progression of the generated song.
📌 Structure Labels
-
The following segments should not contain lyrics (they are purely instrumental):
[intro-short]
,[intro-medium]
,[inst-short]
,[inst-medium]
,[outro-short]
,[outro-medium]
short
indicates a segment of approximately 0–10 secondsmedium
indicates a segment of approximately 10–20 seconds- We find that [inst] label is less stable, so we recommend that you do not use it.
-
The following segments require lyrics:
[verse]
,[chorus]
,[bridge]
Current supported segments are:
[verse]
[chorus]
[bridge]
[intro-short]
[intro-medium]
[intro-long]
[outro-short]
[outro-medium]
[outro-long]
[inst-short]
[inst-medium]
[inst-long]
[silence]
🧾 Lyrics Formatting Rules
-
Each section is separated by an empty line
-
Within lyrical segments (
[verse]
,[chorus]
,[bridge]
), lyrics must be written in complete sentences, and each sentence is one line.
📝 Description Input Format
The description
field allows you to control various musical attributes of the generated song. It can describe up to six musical dimensions: Gender (e.g., male, female), Timbre (e.g., dark, bright, soft), Genre (e.g., pop, jazz, rock), Emotion (e.g., sad, energetic, romantic), Instrument (e.g., piano, drums, guitar), BPM (e.g., the bpm is 120).
-
All six dimensions are optional — you can specify any subset of them.
-
The order of dimensions is flexible.
-
Use commas (
,
) to separate different attributes. -
Although the model supports open vocabulary, we recommend using predefined tags for more stable and reliable performance. A list of commonly supported tags for each dimension is available in sample descriptions.
-
Here are a few valid
descriptions
inputs:- female, dark, pop, sad, piano and drums, the bpm is 125. - male, piano, jazz. - male, dark, the bpm is 110.
🎧Prompt Audio Usage Notes
- The input audio file can be longer than 10 seconds, but only the first 10 seconds will be used.
- For best musicality and structure, it is recommended to use the chorus section of a song as the prompt audio.
- You can use this field to influence genre, instrumentation, rhythm, and voice.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/song-generation" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"lyric": "",
"description": "",
"prompt_audio": "",
"genre": "Auto",
"guidance_scale": 1.5,
"temperature": 0.9,
"top_k": 50,
"seed": -1
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
Parameter | Type | Required | Default | Range | Description |
---|---|---|---|---|---|
lyric | string | Yes | - | - | Each paragraph represents a segment starting with a structure tag and ending with a blank line, each line is a sentence without punctuation, segments [intro], [inst], [outro] should not contain lyrics, while [verse], [chorus], and [bridge] require lyrics. |
description | string | No | - | - | Song Description (Optional). Describe the gender, timbre, genre, emotion, instrument and bpm of the song. Only English is supported currently. |
prompt_audio | string | No | - | - | Prompt Audio (Optional). Provide a URL to an audio file that serves as a prompt for the genre of the song generation. |
genre | string | No | Auto | - | Genre Select (Optional). Choose a genre for the song. |
guidance_scale | number | No | 1.5 | 0.1 ~ 3.0 | The guidance scale to use for the generation. |
temperature | number | No | 0.9 | 0.1 ~ 2.0 | The temperature to use for the generation. A higher value means more randomness in the output. |
top_k | integer | No | 50 | 1 ~ 100 | The top-k value to use for the generation. This controls the number of highest probability vocabulary tokens to keep for top-k-filtering. |
seed | integer | No | -1 | -1 ~ 2147483647 | The random seed to use for the generation. -1 means a random seed will be used. |
Response Parameters
Parameter | Type | Description |
---|---|---|
code | integer | HTTP status code (e.g., 200 for success) |
message | string | Status message (e.g., “success”) |
data.id | string | Unique identifier for the prediction, Task Id |
data.model | string | Model ID used for the prediction |
data.outputs | array | Array of URLs to the generated content (empty when status is not completed ) |
data.urls | object | Object containing related API endpoints |
data.urls.get | string | URL to retrieve the prediction result |
data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
data.status | string | Status of the task: created , processing , completed , or failed |
data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
data.error | string | Error message (empty if no error occurred) |
data.timings | object | Object containing timing details |
data.timings.inference | integer | Inference time in milliseconds |
Result Query Parameters
Result Request Parameters
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
id | string | Yes | - | Task ID |
Result Response Parameters
Parameter | Type | Description |
---|---|---|
code | integer | HTTP status code (e.g., 200 for success) |
message | string | Status message (e.g., “success”) |
data | object | The prediction data object containing all details |
data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
data.model | string | Model ID used for the prediction |
data.outputs | array | Array of URLs to the generated content (empty when status is not completed ) |
data.urls | object | Object containing related API endpoints |
data.urls.get | string | URL to retrieve the prediction result |
data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
data.status | string | Status of the task: created , processing , completed , or failed |
data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
data.error | string | Error message (empty if no error occurred) |
data.timings | object | Object containing timing details |
data.timings.inference | integer | Inference time in milliseconds |