Browse ModelsStability AIStability AI Stable Audio 3 Text To Audio

Stability Ai Stable Audio 3 Text To Audio

Stability Ai Stable Audio 3 Text To Audio

Playground

Try it on WavespeedAI!

Stable Audio 3 Text-to-Audio is a fast AI audio generation model that creates sound effects from text prompts with controllable duration and output format. Ready-to-use REST inference API for sound effect generation, ambient audio, game audio, video production, cinematic sound design, creator content, and professional text-to-audio workflows with simple integration, no coldstarts, and affordable pricing.

Features

Stability AI Stable Audio 3 Text-to-Audio

Stability AI Stable Audio 3 Text-to-Audio generates audio directly from a natural-language prompt, with controls for duration, negative prompting, inference steps, guidance scale, and output format. It is suitable for sound design, ambient scenes, cinematic textures, audio prototyping, and other prompt-driven audio generation workflows.


Why Choose This?

  • Prompt-based audio generation
    Generate audio from a text description of mood, environment, texture, or sound event.

  • Duration control
    Choose how long the generated audio should be.

  • Negative prompt support
    Use negative_prompt to steer the model away from unwanted elements.

  • Generation controls
    Adjust num_inference_steps and guidance_scale to balance fidelity, control, and generation behavior.

  • Flexible output format
    Export the generated audio in a supported format such as mp3.

  • Production-ready API
    Suitable for sound effects, ambience, cinematic scenes, creative prototyping, and audio ideation workflows.


Parameters

ParameterRequiredDescription
promptYesText prompt describing the audio you want to generate.
durationNoOutput audio duration in seconds.
negative_promptNoText description of sounds or qualities you want to avoid.
num_inference_stepsNoNumber of inference steps used during generation.
guidance_scaleNoControls how strongly the model follows the prompt.
output_formatNoOutput audio format, such as mp3.

How to Use

  1. Write your prompt — describe the sound scene, mood, texture, and pacing you want.
  2. Set duration (optional) — choose the target audio length.
  3. Add a negative prompt (optional) — specify anything you want the model to avoid.
  4. Adjust generation settings (optional) — tune num_inference_steps and guidance_scale if needed.
  5. Choose output format (optional) — select the format that best fits your workflow.
  6. Submit — run the model and download the generated audio.

Example Prompt

30-second cinematic sound-design scene: an isolated desert gas station at midnight with buzzing fluorescent lights, distant highway wind, a loose metal sign creaking, insects around the lamps, and a faraway truck passing slowly. Keep it detailed, spatial, realistic, and non-musical.


Pricing

Just $0.0206 per generation.

Billing Rules

  • Each generation costs $0.0206
  • Pricing is fixed per request
  • duration, negative_prompt, num_inference_steps, guidance_scale, and output_format do not affect pricing

Best Use Cases

  • Sound design — Generate cinematic and environmental sound scenes from prompts.
  • Ambient audio — Create non-musical background textures and atmosphere beds.
  • Creative prototyping — Explore multiple sound directions quickly from text.
  • Game and film audio ideation — Produce draft effects and scene ambience for early-stage workflows.
  • Content production — Generate custom audio for videos, trailers, podcasts, or installations.

Pro Tips

  • Be specific in your prompt about environment, motion, texture, and realism.
  • Use negative_prompt when you want to suppress music, vocals, distortion, or unwanted artifacts.
  • Increase num_inference_steps if you want potentially more refined output and can tolerate more runtime.
  • Adjust guidance_scale when you want tighter prompt adherence.
  • Start with a clear, focused prompt before adding extra details.

Notes

  • prompt is required.
  • Pricing is fixed at $0.0206 per generation.
  • Better prompts usually improve both realism and controllability.
  • This workflow is especially useful for non-musical and cinematic audio generation.

  • Other Stability AI audio generation workflows — Useful when you need different quality, speed, or control trade-offs.
  • Prompt-based sound generation models — Useful when you want alternate audio styles or generation behavior.


<ApiPage model={model}>
  ## Authentication

  For authentication details, please refer to the [Authentication Guide](/docs-authentication).

  ## API Endpoints

  ### Submit Task & Query Result

  ## Parameters

  ### Task Submission Parameters

  #### Request Parameters

  #### Response Parameters

  <SubmitResponse />

  #### Result Request Parameters

  | Parameter | Type | Required | Default | Description |
  |-----------|------|----------|---------|-------------|
  | id | string | Yes | - | Task ID |

  #### Result Response Parameters

  | Parameter | Type | Description |
  |-----------|------|-------------|
  | code | integer | HTTP status code (e.g., 200 for success) |
  | message | string | Status message (e.g., "success") |
  | data | object | The prediction data object containing all details |
  | data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
  | data.model | string | Model ID used for the prediction |
  | data.outputs | string | Array of generated audio URLs. |
  | data.urls | object | Object containing related API endpoints |
  | data.urls.get | string | URL to retrieve the prediction result |
  | data.status | string | Status of the task: `created`, `processing`, `completed`, or `failed` |
  | data.created_at | string | ISO timestamp of when the request was created (e.g., "2023-04-01T12:34:56.789Z") |
  | data.error | string | Error message (empty if no error occurred) |
  | data.timings | object | Object containing timing details |
  | data.timings.inference | integer | Inference time in milliseconds |

</ApiPage>

  
© 2025 WaveSpeedAI. All rights reserved.