Stability Ai Stable Audio 3 Audio To Audio
Playground
Try it on WavespeedAI!Stable Audio 3 Audio-to-Audio is a fast AI audio transformation model that transforms a source audio clip using a text prompt. Ready-to-use REST inference API for audio style transfer, sound effect transformation, music remixing, creative audio editing, game audio, video sound design, and professional audio-to-audio workflows with simple integration, no coldstarts, and affordable pricing.
Features
Stability AI Stable Audio 3 Audio-to-Audio
Stability AI Stable Audio 3 Audio-to-Audio transforms an existing audio clip into a new result guided by a natural-language prompt. It is designed for remixing, restyling, sound redesign, mood transfer, and other prompt-driven audio transformation workflows.
Why Choose This?
-
Audio transformation workflow
Start from an existing audio clip and transform it into a new result instead of generating from scratch. -
Prompt-guided editing Use a text prompt to describe the target sound, mood, texture, or musical direction.
-
Controllable transformation strength Adjust
init_noise_levelto control how strongly the output departs from the source audio. -
Negative prompt support Use
negative_promptto avoid unwanted sounds, textures, or stylistic elements. -
Flexible output duration Choose the target output length up to
120seconds. -
Production-ready API Useful for music restyling, sound design, ambient transformation, and creative audio experimentation.
Parameters
| Parameter | Required | Description |
|---|---|---|
| audio | Yes | Source audio to transform. |
| prompt | Yes | Text prompt describing how to transform the audio. |
| duration | No | Target audio duration in seconds. Range: 1–120. Default: 30. |
| init_noise_level | No | Controls how strongly the source audio is transformed. Range: 0–1. Default: 0.9. |
| negative_prompt | No | Optional terms to avoid in the generated audio. |
| num_inference_steps | No | Number of inference steps. Range: 1–100. Default: 8. |
| guidance_scale | No | Prompt guidance strength. Range: 0–25. Default: 1. |
| output_format | No | Output audio format. Supported values: mp3, wav, flac, ogg, opus, m4a, aac. Default: mp3. |
How to Use
- Upload your source audio — provide the clip you want to transform.
- Write your prompt — describe the target sound, style, mood, or arrangement you want.
- Set duration (optional) — choose the desired output length.
- Adjust transformation strength (optional) — use
init_noise_levelto control how far the output moves away from the source. - Add a negative prompt (optional) — specify sounds or qualities to avoid.
- Tune generation settings (optional) — adjust
num_inference_stepsandguidance_scaleif needed. - Choose output format — select the format that best fits your workflow.
- Submit — run the model and download the transformed audio.
Example Prompt
Transform this into a dark cinematic ambient track with deeper low-end texture, distant metallic resonance, slower pacing, and a more atmospheric, spacious mix.
Pricing
Just $0.024 per request.
Billing Rules
- Each audio-to-audio generation request costs $0.024
- Pricing is fixed per request
duration,init_noise_level,negative_prompt,num_inference_steps,guidance_scale, andoutput_formatdo not affect pricing
Best Use Cases
- Music restyling — Rework an existing clip into a different sonic direction.
- Sound design transformation — Turn a source recording into a new textured or cinematic result.
- Ambient and mood transfer — Shift the emotional tone of an existing audio clip.
- Creative remix prototyping — Explore alternate versions of a source sound quickly.
- Post-production experimentation — Generate stylized variants before committing to manual editing.
Pro Tips
- Use a prompt that clearly describes the target direction instead of repeating details already present in the source.
- Lower
init_noise_levelwhen you want the result to stay closer to the original audio. - Raise
init_noise_levelwhen you want a stronger transformation. - Use
negative_promptto suppress unwanted artifacts, vocals, harshness, or specific genres. - Choose
wavorflacwhen you plan to do further editing after generation.
Notes
audioandpromptare required.durationsupports1–120seconds.init_noise_levelcontrols how strongly the model transforms the input audio.- Pricing is fixed at $0.024 per request.
- This workflow is intended for transformation of existing audio, not fresh generation from text alone.
Related Models
- Stability AI Stable Audio 3 Music — Generate music directly from a text prompt.
- Stability AI Stable Audio 3 Text-to-Audio — Generate general audio and sound scenes from text prompts.
- Stability AI Stable Audio 3 Audio-Outpainting — Extend an existing audio clip before and/or after the source.
- Stability AI Stable Audio 3 Audio-Inpainting — Replace a selected region inside an existing audio clip.
<ApiPage model={model}>
## Authentication
For authentication details, please refer to the [Authentication Guide](/docs-authentication).
## API Endpoints
### Submit Task & Query Result
## Parameters
### Task Submission Parameters
#### Request Parameters
#### Response Parameters
<SubmitResponse />
#### Result Request Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| id | string | Yes | - | Task ID |
#### Result Response Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., "success") |
| data | object | The prediction data object containing all details |
| data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
| data.model | string | Model ID used for the prediction |
| data.outputs | string | Array of generated audio URLs. |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: `created`, `processing`, `completed`, or `failed` |
| data.created_at | string | ISO timestamp of when the request was created (e.g., "2023-04-01T12:34:56.789Z") |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |
</ApiPage>