Stable Audio 3 Audio-to-Audio API

Stability AI Stable Audio 3 Audio-to-Audio

Stability AI Stable Audio 3 Audio-to-Audio transforms an existing audio clip into a new result guided by a natural-language prompt. It is designed for remixing, restyling, sound redesign, mood transfer, and other prompt-driven audio transformation workflows.

Why Choose This?

Audio transformation workflow
Start from an existing audio clip and transform it into a new result instead of generating from scratch.
Prompt-guided editing Use a text prompt to describe the target sound, mood, texture, or musical direction.
Controllable transformation strength Adjust init_noise_level to control how strongly the output departs from the source audio.
Negative prompt support Use negative_prompt to avoid unwanted sounds, textures, or stylistic elements.
Flexible output duration Choose the target output length up to 120 seconds.
Production-ready API Useful for music restyling, sound design, ambient transformation, and creative audio experimentation.

Parameters

Parameter	Required	Description
audio	Yes	Source audio to transform.
prompt	Yes	Text prompt describing how to transform the audio.
duration	No	Target audio duration in seconds. Range: `1–120`. Default: `30`.
init_noise_level	No	Controls how strongly the source audio is transformed. Range: `0–1`. Default: `0.9`.
negative_prompt	No	Optional terms to avoid in the generated audio.
num_inference_steps	No	Number of inference steps. Range: `1–100`. Default: `8`.
guidance_scale	No	Prompt guidance strength. Range: `0–25`. Default: `1`.
output_format	No	Output audio format. Supported values: `mp3`, `wav`, `flac`, `ogg`, `opus`, `m4a`, `aac`. Default: `mp3`.

How to Use

Upload your source audio — provide the clip you want to transform.
Write your prompt — describe the target sound, style, mood, or arrangement you want.
Set duration (optional) — choose the desired output length.
Adjust transformation strength (optional) — use init_noise_level to control how far the output moves away from the source.
Add a negative prompt (optional) — specify sounds or qualities to avoid.
Tune generation settings (optional) — adjust num_inference_steps and guidance_scale if needed.
Choose output format — select the format that best fits your workflow.
Submit — run the model and download the transformed audio.

Example Prompt

Transform this into a dark cinematic ambient track with deeper low-end texture, distant metallic resonance, slower pacing, and a more atmospheric, spacious mix.

Pricing

Just $0.024 per request.

Billing Rules

Each audio-to-audio generation request costs $0.024
Pricing is fixed per request
duration, init_noise_level, negative_prompt, num_inference_steps, guidance_scale, and output_format do not affect pricing

Best Use Cases

Music restyling — Rework an existing clip into a different sonic direction.
Sound design transformation — Turn a source recording into a new textured or cinematic result.
Ambient and mood transfer — Shift the emotional tone of an existing audio clip.
Creative remix prototyping — Explore alternate versions of a source sound quickly.
Post-production experimentation — Generate stylized variants before committing to manual editing.

Pro Tips

Use a prompt that clearly describes the target direction instead of repeating details already present in the source.
Lower init_noise_level when you want the result to stay closer to the original audio.
Raise init_noise_level when you want a stronger transformation.
Use negative_prompt to suppress unwanted artifacts, vocals, harshness, or specific genres.
Choose wav or flac when you plan to do further editing after generation.

Notes

audio and prompt are required.
duration supports 1–120 seconds.
init_noise_level controls how strongly the model transforms the input audio.
Pricing is fixed at $0.024 per request.
This workflow is intended for transformation of existing audio, not fresh generation from text alone.

Related Models

Stability AI Stable Audio 3 Music — Generate music directly from a text prompt.
Stability AI Stable Audio 3 Text-to-Audio — Generate general audio and sound scenes from text prompts.
Stability AI Stable Audio 3 Audio-Outpainting — Extend an existing audio clip before and/or after the source.
Stability AI Stable Audio 3 Audio-Inpainting — Replace a selected region inside an existing audio clip.

Stable Audio 3 Audio To Audio API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/stability-ai/stable-audio-3/audio-to-audio with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Stable Audio 3 Audio To Audio below.

HTTP example

# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/stability-ai/stable-audio-3/audio-to-audio" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "audio": "https://example.com/your-audio.mp3",
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "duration": 30,
    "init_noise_level": 0.9,
    "negative_prompt": "blurry, low quality, distorted",
    "num_inference_steps": 8,
    "guidance_scale": 1,
    "output_format": "mp3"
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].

Node.js example

// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("stability-ai/stable-audio-3/audio-to-audio", {
        "audio": "https://example.com/your-audio.mp3",
        "prompt": "A cinematic shot of a city at sunset, soft golden light",
        "duration": 30,
        "init_noise_level": 0.9,
        "negative_prompt": "blurry, low quality, distorted",
        "num_inference_steps": 8,
        "guidance_scale": 1,
        "output_format": "mp3"
});

console.log(result.outputs[0]); // → URL of the generated output

Python example

# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "stability-ai/stable-audio-3/audio-to-audio",
    {
    "audio": "https://example.com/your-audio.mp3",
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "duration": 30,
    "init_noise_level": 0.9,
    "negative_prompt": "blurry, low quality, distorted",
    "num_inference_steps": 8,
    "guidance_scale": 1,
    "output_format": "mp3"
}
)

print(output["outputs"][0])  # → URL of the generated output

Stable Audio 3 Audio To Audio API — Frequently asked questions

What is the Stable Audio 3 Audio To Audio API?

Stable Audio 3 Audio To Audio is a Stability AI model for AI inference, exposed as a REST API on WaveSpeedAI. Stable Audio 3 Audio-to-Audio is a fast AI audio transformation model that transforms a source audio clip using a text prompt. Ready-to-use REST inference API for audio style transfer, sound effect transformation, music remixing, creative audio editing, game audio, video sound design, and professional audio-to-audio workflows with simple integration, no coldstarts, and affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Stable Audio 3 Audio To Audio API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/stability-ai/stability-ai-stable-audio-3-audio-to-audio.

How much does Stable Audio 3 Audio To Audio cost per run?

Stable Audio 3 Audio To Audio starts at $0.024 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Stable Audio 3 Audio To Audio accept?

Key inputs: `prompt`, `audio`, `duration`, `guidance_scale`, `num_inference_steps`, `negative_prompt`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/stability-ai/stability-ai-stable-audio-3-audio-to-audio.

How do I get started with the Stable Audio 3 Audio To Audio API?

Sign up for a free WaveSpeedAI account to claim starter credits, copy your API key from /accesskey, then call the endpoint shown in the API tab of the playground. The playground also auto-generates a code sample in Python, JavaScript, or cURL for the parameters you've set.

Can I use Stable Audio 3 Audio To Audio outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (Stability AI). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.

ExamplesView all

Related Models

README

Stability AI Stable Audio 3 Audio-to-Audio

Why Choose This?

Parameters

How to Use

Example Prompt

Pricing

Billing Rules

Best Use Cases

Pro Tips

Notes

Related Models

Stable Audio 3 Audio To Audio API — Quick start

Stable Audio 3 Audio To Audio API — Frequently asked questions