Song Generation | Realistic Voice & TTS API

SongGeneration

SongGeneration (LeVo) is an open-source text-to-song model developed by Tencent AI Lab that generates high-quality songs with lyrics. It aligns with cutting-edge commercial music generation models like Suno 4.5. Provide lyrics, and optionally an audio or text prompt, to generate a custom song.

Usage

Provide a lyrics, and optionally an audio or text prompt, to generate a custom song.

Lyrics format

Lyrics need to be in the following format:

[structure tag]
lyrics
[structure tag]
lyrics

One paragraph represents one segments, starting with a structure tag and ending with a blank line
One line represents one sentence, punctuation is not recommended inside the sentence
The following segments should not contain lyrics: [intro-short], [intro-medium], [inst-short], [inst-medium], [outro-short], [outro-medium]
The following segments require lyrics: [verse], [chorus], [bridge]

An example of Lyrics is as follows:

[intro-short]

[verse]
Streetlights flicker in the night
I wander through familiar corners
Memories rush in like a tide
Your smile so vivid and bright
Etched in my heart, it won’t fade
All those moments once so sweet
Now I’m left with only memories

[verse]
My phone screen lights up
A message from you appears
Just a few simple words
Yet they bring me to tears
The warmth of your embrace
Now feels so far away
How I wish to turn back time
And have you by my side again

[chorus]
The warmth of memories still remains
But you are gone
My heart was filled with love
Now pierced by longing
The rhythm of music plays
But my heart is drifting
In days without you
How can I keep moving on

[outro-short]

Description

Description could be used to describe the genre of the music, as well as the timbre.

female, dark, pop, sad, piano and drums, the bpm is 125

Prompt Audio

Prompt Audio could be used to guide the model to learn the genre in the music audio.

Priority

Priority: prompt_audio > description > genre

Input Guide

🎵 Lyrics Input Format

The lyric field defines the lyrics and structure of the song. It consists of multiple musical section, each starting with a structure label. The model uses these labels to guide the musical and lyrical progression of the generated song.

📌 Structure Labels

The following segments should not contain lyrics (they are purely instrumental):
[intro-short], [intro-medium], [inst-short], [inst-medium], [outro-short], [outro-medium]

short indicates a segment of approximately 0–10 seconds

medium indicates a segment of approximately 10–20 seconds

We find that [inst] label is less stable, so we recommend that you do not use it.

The following segments require lyrics:
[verse], [chorus], [bridge]

Current supported segments are:

[verse]
[chorus]
[bridge]
[intro-short]
[intro-medium]
[intro-long]
[outro-short]
[outro-medium]
[outro-long]
[inst-short]
[inst-medium]
[inst-long]
[silence]

🧾 Lyrics Formatting Rules

Each section is separated by an empty line
Within lyrical segments ([verse], [chorus], [bridge]), lyrics must be written in complete sentences, and each sentence is one line.

📝 Description Input Format

The description field allows you to control various musical attributes of the generated song. It can describe up to six musical dimensions: Gender (e.g., male, female), Timbre (e.g., dark, bright, soft), Genre (e.g., pop, jazz, rock), Emotion (e.g., sad, energetic, romantic), Instrument (e.g., piano, drums, guitar), BPM (e.g., the bpm is 120).

All six dimensions are optional — you can specify any subset of them.
The order of dimensions is flexible.
Use commas (,) to separate different attributes.
Although the model supports open vocabulary, we recommend using predefined tags for more stable and reliable performance. A list of commonly supported tags for each dimension is available in sample descriptions.
Here are a few valid descriptions inputs:

- female, dark, pop, sad, piano and drums, the bpm is 125.
- male, piano, jazz.
- male, dark, the bpm is 110.

🎧Prompt Audio Usage Notes

The input audio file can be longer than 10 seconds, but only the first 10 seconds will be used.
For best musicality and structure, it is recommended to use the chorus section of a song as the prompt audio.
You can use this field to influence genre, instrumentation, rhythm, and voice.

Song Generation API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/wavespeed-ai/song-generation with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Song Generation below.

HTTP example

# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/song-generation" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "genre": "Auto",
    "guidance_scale": 1.5,
    "temperature": 0.9,
    "top_k": 50,
    "seed": -1
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].

Node.js example

// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("wavespeed-ai/song-generation", {
        "genre": "Auto",
        "guidance_scale": 1.5,
        "temperature": 0.9,
        "top_k": 50,
        "seed": -1
});

console.log(result.outputs[0]); // → URL of the generated output

Python example

# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "wavespeed-ai/song-generation",
    {
    "genre": "Auto",
    "guidance_scale": 1.5,
    "temperature": 0.9,
    "top_k": 50,
    "seed": -1
}
)

print(output["outputs"][0])  # → URL of the generated output

Song Generation API — Frequently asked questions

What is the Song Generation API?

Song Generation is a WaveSpeedAI model for audio generation, exposed as a REST API on WaveSpeedAI. SongGeneration (LeVo) is an open-source text-to-song model that turns lyrics and optional audio or text prompts into high-quality songs. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Song Generation API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/wavespeed-ai/song-generation.

How much does Song Generation cost per run?

Song Generation starts at $0.050 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Song Generation accept?

Key inputs: `seed`, `guidance_scale`, `description`, `genre`, `lyric`, `prompt_audio`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/wavespeed-ai/song-generation.

How long does Song Generation take to generate?

Average end-to-end generation time on WaveSpeedAI is around 92 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.

Can I use Song Generation outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (WaveSpeedAI). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.

ExamplesView all

Related Models

README