Browse ModelsStability AIStability AI Stable Audio 3 Audio To Audio

Stability Ai Stable Audio 3 Audio To Audio

Stability Ai Stable Audio 3 Audio To Audio

Playground

Try it on WavespeedAI!

Stable Audio 3 Audio-to-Audio is a fast AI audio transformation model that transforms a source audio clip using a text prompt. Ready-to-use REST inference API for audio style transfer, sound effect transformation, music remixing, creative audio editing, game audio, video sound design, and professional audio-to-audio workflows with simple integration, no coldstarts, and affordable pricing.

Features

Stability AI Stable Audio 3 Audio-to-Audio

Stability AI Stable Audio 3 Audio-to-Audio transforms an existing audio clip into a new result guided by a natural-language prompt. It is designed for remixing, restyling, sound redesign, mood transfer, and other prompt-driven audio transformation workflows.


Why Choose This?

  • Audio transformation workflow
    Start from an existing audio clip and transform it into a new result instead of generating from scratch.

  • Prompt-guided editing Use a text prompt to describe the target sound, mood, texture, or musical direction.

  • Controllable transformation strength Adjust init_noise_level to control how strongly the output departs from the source audio.

  • Negative prompt support Use negative_prompt to avoid unwanted sounds, textures, or stylistic elements.

  • Flexible output duration Choose the target output length up to 120 seconds.

  • Production-ready API Useful for music restyling, sound design, ambient transformation, and creative audio experimentation.


Parameters

ParameterRequiredDescription
audioYesSource audio to transform.
promptYesText prompt describing how to transform the audio.
durationNoTarget audio duration in seconds. Range: 1–120. Default: 30.
init_noise_levelNoControls how strongly the source audio is transformed. Range: 0–1. Default: 0.9.
negative_promptNoOptional terms to avoid in the generated audio.
num_inference_stepsNoNumber of inference steps. Range: 1–100. Default: 8.
guidance_scaleNoPrompt guidance strength. Range: 0–25. Default: 1.
output_formatNoOutput audio format. Supported values: mp3, wav, flac, ogg, opus, m4a, aac. Default: mp3.

How to Use

  1. Upload your source audio — provide the clip you want to transform.
  2. Write your prompt — describe the target sound, style, mood, or arrangement you want.
  3. Set duration (optional) — choose the desired output length.
  4. Adjust transformation strength (optional) — use init_noise_level to control how far the output moves away from the source.
  5. Add a negative prompt (optional) — specify sounds or qualities to avoid.
  6. Tune generation settings (optional) — adjust num_inference_steps and guidance_scale if needed.
  7. Choose output format — select the format that best fits your workflow.
  8. Submit — run the model and download the transformed audio.

Example Prompt

Transform this into a dark cinematic ambient track with deeper low-end texture, distant metallic resonance, slower pacing, and a more atmospheric, spacious mix.


Pricing

Just $0.024 per request.

Billing Rules

  • Each audio-to-audio generation request costs $0.024
  • Pricing is fixed per request
  • duration, init_noise_level, negative_prompt, num_inference_steps, guidance_scale, and output_format do not affect pricing

Best Use Cases

  • Music restyling — Rework an existing clip into a different sonic direction.
  • Sound design transformation — Turn a source recording into a new textured or cinematic result.
  • Ambient and mood transfer — Shift the emotional tone of an existing audio clip.
  • Creative remix prototyping — Explore alternate versions of a source sound quickly.
  • Post-production experimentation — Generate stylized variants before committing to manual editing.

Pro Tips

  • Use a prompt that clearly describes the target direction instead of repeating details already present in the source.
  • Lower init_noise_level when you want the result to stay closer to the original audio.
  • Raise init_noise_level when you want a stronger transformation.
  • Use negative_prompt to suppress unwanted artifacts, vocals, harshness, or specific genres.
  • Choose wav or flac when you plan to do further editing after generation.

Notes

  • audio and prompt are required.
  • duration supports 1–120 seconds.
  • init_noise_level controls how strongly the model transforms the input audio.
  • Pricing is fixed at $0.024 per request.
  • This workflow is intended for transformation of existing audio, not fresh generation from text alone.

  • Stability AI Stable Audio 3 Music — Generate music directly from a text prompt.
  • Stability AI Stable Audio 3 Text-to-Audio — Generate general audio and sound scenes from text prompts.
  • Stability AI Stable Audio 3 Audio-Outpainting — Extend an existing audio clip before and/or after the source.
  • Stability AI Stable Audio 3 Audio-Inpainting — Replace a selected region inside an existing audio clip.


<ApiPage model={model}>
  ## Authentication

  For authentication details, please refer to the [Authentication Guide](/docs-authentication).

  ## API Endpoints

  ### Submit Task & Query Result

  ## Parameters

  ### Task Submission Parameters

  #### Request Parameters

  #### Response Parameters

  <SubmitResponse />

  #### Result Request Parameters

  | Parameter | Type | Required | Default | Description |
  |-----------|------|----------|---------|-------------|
  | id | string | Yes | - | Task ID |

  #### Result Response Parameters

  | Parameter | Type | Description |
  |-----------|------|-------------|
  | code | integer | HTTP status code (e.g., 200 for success) |
  | message | string | Status message (e.g., "success") |
  | data | object | The prediction data object containing all details |
  | data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
  | data.model | string | Model ID used for the prediction |
  | data.outputs | string | Array of generated audio URLs. |
  | data.urls | object | Object containing related API endpoints |
  | data.urls.get | string | URL to retrieve the prediction result |
  | data.status | string | Status of the task: `created`, `processing`, `completed`, or `failed` |
  | data.created_at | string | ISO timestamp of when the request was created (e.g., "2023-04-01T12:34:56.789Z") |
  | data.error | string | Error message (empty if no error occurred) |
  | data.timings | object | Object containing timing details |
  | data.timings.inference | integer | Inference time in milliseconds |

</ApiPage>

  
© 2025 WaveSpeedAI. All rights reserved.