WaveSpeedAI APISong Generation

Song Generation

Song Generation

Playground

Try it on WavespeedAI!

SongGeneration (LeVo) is an open-source text-to-song model that generates high-quality songs with lyrics. It aligns with cutting-edge commercial music generation models like Suno 4.5. Provide lyrics, and optionally an audio or text prompt, to generate a custom song.

Features

SongGeneration

SongGeneration (LeVo) is an open-source text-to-song model developed by Tencent AI Lab that generates high-quality songs with lyrics. It aligns with cutting-edge commercial music generation models like Suno 4.5. Provide lyrics, and optionally an audio or text prompt, to generate a custom song.

Usage

Provide a lyrics, and optionally an audio or text prompt, to generate a custom song.

Lyrics format

Lyrics need to be in the following format:

[structure tag]
lyrics
[structure tag]
lyrics
  1. One paragraph represents one segments, starting with a structure tag and ending with a blank line
  2. One line represents one sentence, punctuation is not recommended inside the sentence
  3. The following segments should not contain lyrics: [intro-short], [intro-medium], [inst-short], [inst-medium], [outro-short], [outro-medium]
  4. The following segments require lyrics: [verse], [chorus], [bridge]

An example of Lyrics is as follows:

[intro-short]

[verse]
Streetlights flicker in the night
I wander through familiar corners
Memories rush in like a tide
Your smile so vivid and bright
Etched in my heart, it won’t fade
All those moments once so sweet
Now I’m left with only memories

[verse]
My phone screen lights up
A message from you appears
Just a few simple words
Yet they bring me to tears
The warmth of your embrace
Now feels so far away
How I wish to turn back time
And have you by my side again

[chorus]
The warmth of memories still remains
But you are gone
My heart was filled with love
Now pierced by longing
The rhythm of music plays
But my heart is drifting
In days without you
How can I keep moving on

[outro-short]

Description

Description could be used to describe the genre of the music, as well as the timbre.

female, dark, pop, sad, piano and drums, the bpm is 125

Prompt Audio

Prompt Audio could be used to guide the model to learn the genre in the music audio.

Priority

Priority: prompt_audio > description > genre

Input Guide

🎵 Lyrics Input Format

The lyric field defines the lyrics and structure of the song. It consists of multiple musical section, each starting with a structure label. The model uses these labels to guide the musical and lyrical progression of the generated song.

📌 Structure Labels

  • The following segments should not contain lyrics (they are purely instrumental):

    • [intro-short], [intro-medium], [inst-short], [inst-medium], [outro-short], [outro-medium]
    • short indicates a segment of approximately 0–10 seconds
    • medium indicates a segment of approximately 10–20 seconds
    • We find that [inst] label is less stable, so we recommend that you do not use it.
  • The following segments require lyrics:

    • [verse], [chorus], [bridge]

Current supported segments are:

[verse]
[chorus]
[bridge]
[intro-short]
[intro-medium]
[intro-long]
[outro-short]
[outro-medium]
[outro-long]
[inst-short]
[inst-medium]
[inst-long]
[silence]

🧾 Lyrics Formatting Rules

  • Each section is separated by an empty line

  • Within lyrical segments ([verse], [chorus], [bridge]), lyrics must be written in complete sentences, and each sentence is one line.

📝 Description Input Format

The description field allows you to control various musical attributes of the generated song. It can describe up to six musical dimensions: Gender (e.g., male, female), Timbre (e.g., dark, bright, soft), Genre (e.g., pop, jazz, rock), Emotion (e.g., sad, energetic, romantic), Instrument (e.g., piano, drums, guitar), BPM (e.g., the bpm is 120).

  • All six dimensions are optional — you can specify any subset of them.

  • The order of dimensions is flexible.

  • Use commas (,) to separate different attributes.

  • Although the model supports open vocabulary, we recommend using predefined tags for more stable and reliable performance. A list of commonly supported tags for each dimension is available in sample descriptions.

  • Here are a few valid descriptions inputs:

    - female, dark, pop, sad, piano and drums, the bpm is 125.
    - male, piano, jazz.
    - male, dark, the bpm is 110.

🎧Prompt Audio Usage Notes

  • The input audio file can be longer than 10 seconds, but only the first 10 seconds will be used.
  • For best musicality and structure, it is recommended to use the chorus section of a song as the prompt audio.
  • You can use this field to influence genre, instrumentation, rhythm, and voice.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/song-generation" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "lyric": "",
    "description": "",
    "prompt_audio": "",
    "genre": "Auto",
    "guidance_scale": 1.5,
    "temperature": 0.9,
    "top_k": 50,
    "seed": -1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
lyricstringYes--Each paragraph represents a segment starting with a structure tag and ending with a blank line, each line is a sentence without punctuation, segments [intro], [inst], [outro] should not contain lyrics, while [verse], [chorus], and [bridge] require lyrics.
descriptionstringNo--Song Description (Optional). Describe the gender, timbre, genre, emotion, instrument and bpm of the song. Only English is supported currently.
prompt_audiostringNo--Prompt Audio (Optional). Provide a URL to an audio file that serves as a prompt for the genre of the song generation.
genrestringNoAuto-Genre Select (Optional). Choose a genre for the song.
guidance_scalenumberNo1.50.1 ~ 3.0The guidance scale to use for the generation.
temperaturenumberNo0.90.1 ~ 2.0The temperature to use for the generation. A higher value means more randomness in the output.
top_kintegerNo501 ~ 100The top-k value to use for the generation. This controls the number of highest probability vocabulary tokens to keep for top-k-filtering.
seedintegerNo-1-1 ~ 2147483647The random seed to use for the generation. -1 means a random seed will be used.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Query Parameters

Result Request Parameters

ParameterTypeRequiredDefaultDescription
idstringYes-Task ID

Result Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
dataobjectThe prediction data object containing all details
data.idstringUnique identifier for the prediction, the ID of the prediction to get
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds
© 2025 WaveSpeedAI. All rights reserved.