Ace Step Audio Inpaint

Playground

ACE-Step Audio Inpaint edits a specific audio segment to change lyrics or style while preserving the surrounding audio. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

ACE-Step Audio Inpaint

ACE-Step Audio Inpaint is a powerful music and audio editing model developed by WaveSpeedAI. It enables precise, intelligent modification of selected time ranges within an existing audio clip — perfect for fixing, remixing, or creatively reimagining tracks without re-generating the whole piece.

Key Features

Precise Segment Editing: Modify only the section you want — define start and end times to edit exactly the range you need.
Seamless Audio Blending: New content merges naturally with surrounding audio for smooth, undetectable transitions.
Flexible Timing Control: Choose whether your start or end times are relative to the beginning or end of the track.
Style & Lyric Adaptability: Add new instrumentation, effects, or lyrics while preserving the overall flow and tone.
Controlled Variation: Adjust how much the regenerated section diverges from the original using seed and creative parameters.

Parameters

Parameter	Description
audio*	Upload or link to an existing audio file (MP3 / WAV).
tags*	Define the target style or mood (e.g., lofi, hiphop, trap, chill).
start_time / end_time	Select the time range (in seconds) to edit.
start_time_relative_to / end_time_relative_to	Choose whether the range is relative to the start or end of the audio.
lyrics	(Optional) Add or replace lyrics for the edited section.
seed	Fix for reproducible results; -1 for randomized variation.

Use Cases

Repair or refine — Fix errors or off-beat moments in specific sections.
Rewrite lyrics — Try new vocal phrasing or emotional tone.
Remix segments — Replace or restyle a part of a song without altering the rest.
Audio storytelling — Modify voiceovers or sound effects within a fixed-length clip.

Pricing

Metric	Price
Per second of source audio	$0.0002 / s

Total cost = duration of uploaded audio (in seconds) × $0.0002

Examples

30s audio → 30 × $0.0002 = $0.006
60s audio → 60 × $0.0002 = $0.012
3 min (180s) audio → 180 × $0.0002 = $0.036

Notes

Pricing is based on the total duration of the source audio file, not the edited segment length.
Ensure uploaded audio URLs are publicly accessible.
Please ensure your content complies with usage guidelines.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/ace-step/audio-inpaint" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "start_time_relative_to": "start",
    "start_time": 0,
    "end_time_relative_to": "start",
    "end_time": 30,
    "lyrics": "",
    "seed": -1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
audio	string	Yes	-	-	Audio file to transcribe. Provide an HTTPS URL or upload a file (MP3, WAV, FLAC up to 60 minutes).
tags	string	Yes	-	-	Comma-separated list of genre tags to control the style.
start_time_relative_to	string	No	start	start, end	Reference point for start time.
start_time	number	No	-	0 ~ 240	Start time in seconds.
end_time_relative_to	string	No	start	start, end	Reference point for end time.
end_time	number	No	30	0 ~ 240	End time in seconds.
lyrics	string	No	-	-	Lyrics to be sung in the audio. Use [inst] or [instrumental] for no vocals.
seed	integer	No	-1	-1 ~ 2147483647	The random seed for reproducibility.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction, the ID of the prediction to get
data.model	string	Model ID used for the prediction
data.outputs	string	Array of URLs to the generated content (empty when status is not completed).
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Ace Step 1.5 Ace Step Audio Outpaint