Stability Ai Stable Audio 3 Audio To Audio

Playground

Stable Audio 3 Audio-to-Audio is a fast AI audio transformation model that transforms a source audio clip using a text prompt. Ready-to-use REST inference API for audio style transfer, sound effect transformation, music remixing, creative audio editing, game audio, video sound design, and professional audio-to-audio workflows with simple integration, no coldstarts, and affordable pricing.

Features

Stability AI Stable Audio 3 Audio-to-Audio

Stability AI Stable Audio 3 Audio-to-Audio transforms an existing audio clip into a new result guided by a natural-language prompt. It is designed for remixing, restyling, sound redesign, mood transfer, and other prompt-driven audio transformation workflows.

Why Choose This?

Audio transformation workflow
Start from an existing audio clip and transform it into a new result instead of generating from scratch.
Prompt-guided editing Use a text prompt to describe the target sound, mood, texture, or musical direction.
Controllable transformation strength Adjust init_noise_level to control how strongly the output departs from the source audio.
Negative prompt support Use negative_prompt to avoid unwanted sounds, textures, or stylistic elements.
Flexible output duration Choose the target output length up to 120 seconds.
Production-ready API Useful for music restyling, sound design, ambient transformation, and creative audio experimentation.

Parameters

Parameter	Required	Description
audio	Yes	Source audio to transform.
prompt	Yes	Text prompt describing how to transform the audio.
duration	No	Target audio duration in seconds. Range: `1–120`. Default: `30`.
init_noise_level	No	Controls how strongly the source audio is transformed. Range: `0–1`. Default: `0.9`.
negative_prompt	No	Optional terms to avoid in the generated audio.
num_inference_steps	No	Number of inference steps. Range: `1–100`. Default: `8`.
guidance_scale	No	Prompt guidance strength. Range: `0–25`. Default: `1`.
output_format	No	Output audio format. Supported values: `mp3`, `wav`, `flac`, `ogg`, `opus`, `m4a`, `aac`. Default: `mp3`.

How to Use

Upload your source audio — provide the clip you want to transform.
Write your prompt — describe the target sound, style, mood, or arrangement you want.
Set duration (optional) — choose the desired output length.
Adjust transformation strength (optional) — use init_noise_level to control how far the output moves away from the source.
Add a negative prompt (optional) — specify sounds or qualities to avoid.
Tune generation settings (optional) — adjust num_inference_steps and guidance_scale if needed.
Choose output format — select the format that best fits your workflow.
Submit — run the model and download the transformed audio.

Example Prompt

Transform this into a dark cinematic ambient track with deeper low-end texture, distant metallic resonance, slower pacing, and a more atmospheric, spacious mix.

Pricing

Just $0.024 per request.

Billing Rules

Each audio-to-audio generation request costs $0.024
Pricing is fixed per request
duration, init_noise_level, negative_prompt, num_inference_steps, guidance_scale, and output_format do not affect pricing

Best Use Cases

Music restyling — Rework an existing clip into a different sonic direction.
Sound design transformation — Turn a source recording into a new textured or cinematic result.
Ambient and mood transfer — Shift the emotional tone of an existing audio clip.
Creative remix prototyping — Explore alternate versions of a source sound quickly.
Post-production experimentation — Generate stylized variants before committing to manual editing.

Pro Tips

Use a prompt that clearly describes the target direction instead of repeating details already present in the source.
Lower init_noise_level when you want the result to stay closer to the original audio.
Raise init_noise_level when you want a stronger transformation.
Use negative_prompt to suppress unwanted artifacts, vocals, harshness, or specific genres.
Choose wav or flac when you plan to do further editing after generation.

Notes

audio and prompt are required.
duration supports 1–120 seconds.
init_noise_level controls how strongly the model transforms the input audio.
Pricing is fixed at $0.024 per request.
This workflow is intended for transformation of existing audio, not fresh generation from text alone.

Stability AI Stable Audio 3 Music — Generate music directly from a text prompt.
Stability AI Stable Audio 3 Text-to-Audio — Generate general audio and sound scenes from text prompts.
Stability AI Stable Audio 3 Audio-Outpainting — Extend an existing audio clip before and/or after the source.
Stability AI Stable Audio 3 Audio-Inpainting — Replace a selected region inside an existing audio clip.



<ApiPage model={model}>
  ## Authentication

  For authentication details, please refer to the [Authentication Guide](/docs-authentication).

  ## API Endpoints

  ### Submit Task & Query Result

  ## Parameters

  ### Task Submission Parameters

  #### Request Parameters

  #### Response Parameters

  <SubmitResponse />

  #### Result Request Parameters

  | Parameter | Type | Required | Default | Description |
  |-----------|------|----------|---------|-------------|
  | id | string | Yes | - | Task ID |

  #### Result Response Parameters

  | Parameter | Type | Description |
  |-----------|------|-------------|
  | code | integer | HTTP status code (e.g., 200 for success) |
  | message | string | Status message (e.g., "success") |
  | data | object | The prediction data object containing all details |
  | data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
  | data.model | string | Model ID used for the prediction |
  | data.outputs | string | Array of generated audio URLs. |
  | data.urls | object | Object containing related API endpoints |
  | data.urls.get | string | URL to retrieve the prediction result |
  | data.status | string | Status of the task: `created`, `processing`, `completed`, or `failed` |
  | data.created_at | string | ISO timestamp of when the request was created (e.g., "2023-04-01T12:34:56.789Z") |
  | data.error | string | Error message (empty if no error occurred) |
  | data.timings | object | Object containing timing details |
  | data.timings.inference | integer | Inference time in milliseconds |

</ApiPage>

Stability AI Stable Audio 3 Audio Outpainting Stability AI Stable Audio 3 Music