Ace Step

Playground

ACE-Step generates up to 4-minute music with lyrics from text and high acoustic fidelity; supports voice cloning, lyric edits, and remixing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

ACE-Step — Text-to-Audio

ACE-Step Text-to-Audio is a next-generation AI music generation model that composes complete songs — including vocals, instrumentals, and lyrics — directly from text descriptions. Produce professional-quality music up to 4 minutes long from simple style tags and optional lyrics.

Why It Stands Out

Text-to-music generation: Transform style tags into coherent music tracks with melody, rhythm, and vocals.
Style tag control: Enter multiple tags to guide genre, tempo, and energy.
Vocal and lyric creation: Generates original vocals and synchronized lyrics that fit your prompt’s tone.
Fine-grained acoustic fidelity: Maintains dynamic balance, spatial quality, and instrument clarity.
Flexible duration: Adjustable from a few seconds to 4 minutes (240 seconds).
Reproducibility: Use the seed parameter to recreate exact results.

Parameters

Parameter	Required	Description
tags	Yes	List of genres or styles (e.g., lofi, hiphop, drum and bass, chill)
lyrics	No	Provide custom lyrics or leave blank for auto-generated ones.
duration	No	Music length in seconds, up to 240 (default: 240).
seed	No	Set for reproducibility; -1 for random.

How to Use

Enter style tags — add genres and moods like “lofi, hiphop, chill, trap.”
Add lyrics (optional) — provide custom lyrics or leave blank for AI-generated ones.
Set duration — choose length from a few seconds up to 240 seconds (4 minutes).
Set a seed (optional) for reproducible results.
Click Run and wait for your music to generate.
Preview and download the result.

Best Use Cases

Music Production & Songwriting — Generate complete demos or backing tracks instantly.
Film, Game & Media Scoring — Create mood-specific tracks with precise control.
Advertising & Content Creation — Design catchy audio for short-form content.
Education & Experimentation — Teach structure, genre, or lyric composition.
Soundtrack Prototyping — Preview musical direction before full studio production.

Pricing

Duration	Price
30 seconds	$0.006
60 seconds	$0.012
120 seconds	$0.024
240 seconds	$0.048

Billing Rules

Billed per second at $0.0002
Maximum duration: 240 seconds (4 minutes)

Pro Tips for Best Quality

Use multiple style tags to define genre, mood, and energy level.
Combine contrasting tags (e.g., “chill, trap”) for unique blends.
Provide structured lyrics with line breaks for better vocal synchronization.
Start with shorter durations to test style combinations.
Fix the seed when iterating to compare different tag or lyric variations.

Notes

Processing time varies based on duration and current queue load.
Please ensure your content complies with usage guidelines.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/ace-step" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "lyrics": "",
    "duration": 60,
    "seed": -1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
tags	string	Yes	-	-	Comma-separated list of genre tags to control the style of the generated audio.
lyrics	string	No	-	-	Vocal content for the track. Use [inst] or [instrumental] for no vocals.
duration	number	No	60	5 ~ 240	Audio length in seconds.
seed	integer	No	-1	-1 ~ 2147483647	The random seed for reproducibility.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction, the ID of the prediction to get
data.model	string	Model ID used for the prediction
data.outputs	string	Array of URLs to the generated content (empty when status is not completed).
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Browse Models Ace Step 1.5