Stability Ai Stable Audio 3 Text To Audio

Playground

Stable Audio 3 Text-to-Audio is a fast AI audio generation model that creates sound effects from text prompts with controllable duration and output format. Ready-to-use REST inference API for sound effect generation, ambient audio, game audio, video production, cinematic sound design, creator content, and professional text-to-audio workflows with simple integration, no coldstarts, and affordable pricing.

Features

Stability AI Stable Audio 3 Text-to-Audio

Stability AI Stable Audio 3 Text-to-Audio generates audio directly from a natural-language prompt, with controls for duration, negative prompting, inference steps, guidance scale, and output format. It is suitable for sound design, ambient scenes, cinematic textures, audio prototyping, and other prompt-driven audio generation workflows.

Why Choose This?

Prompt-based audio generation
Generate audio from a text description of mood, environment, texture, or sound event.
Duration control
Choose how long the generated audio should be.
Negative prompt support
Use negative_prompt to steer the model away from unwanted elements.
Generation controls
Adjust num_inference_steps and guidance_scale to balance fidelity, control, and generation behavior.
Flexible output format
Export the generated audio in a supported format such as mp3.
Production-ready API
Suitable for sound effects, ambience, cinematic scenes, creative prototyping, and audio ideation workflows.

Parameters

Parameter	Required	Description
prompt	Yes	Text prompt describing the audio you want to generate.
duration	No	Output audio duration in seconds.
negative_prompt	No	Text description of sounds or qualities you want to avoid.
num_inference_steps	No	Number of inference steps used during generation.
guidance_scale	No	Controls how strongly the model follows the prompt.
output_format	No	Output audio format, such as `mp3`.

How to Use

Write your prompt — describe the sound scene, mood, texture, and pacing you want.
Set duration (optional) — choose the target audio length.
Add a negative prompt (optional) — specify anything you want the model to avoid.
Adjust generation settings (optional) — tune num_inference_steps and guidance_scale if needed.
Choose output format (optional) — select the format that best fits your workflow.
Submit — run the model and download the generated audio.

Example Prompt

30-second cinematic sound-design scene: an isolated desert gas station at midnight with buzzing fluorescent lights, distant highway wind, a loose metal sign creaking, insects around the lamps, and a faraway truck passing slowly. Keep it detailed, spatial, realistic, and non-musical.

Pricing

Just $0.0206 per generation.

Billing Rules

Each generation costs $0.0206
Pricing is fixed per request
duration, negative_prompt, num_inference_steps, guidance_scale, and output_format do not affect pricing

Best Use Cases

Sound design — Generate cinematic and environmental sound scenes from prompts.
Ambient audio — Create non-musical background textures and atmosphere beds.
Creative prototyping — Explore multiple sound directions quickly from text.
Game and film audio ideation — Produce draft effects and scene ambience for early-stage workflows.
Content production — Generate custom audio for videos, trailers, podcasts, or installations.

Pro Tips

Be specific in your prompt about environment, motion, texture, and realism.
Use negative_prompt when you want to suppress music, vocals, distortion, or unwanted artifacts.
Increase num_inference_steps if you want potentially more refined output and can tolerate more runtime.
Adjust guidance_scale when you want tighter prompt adherence.
Start with a clear, focused prompt before adding extra details.

Notes

prompt is required.
Pricing is fixed at $0.0206 per generation.
Better prompts usually improve both realism and controllability.
This workflow is especially useful for non-musical and cinematic audio generation.

Other Stability AI audio generation workflows — Useful when you need different quality, speed, or control trade-offs.
Prompt-based sound generation models — Useful when you want alternate audio styles or generation behavior.



<ApiPage model={model}>
  ## Authentication

  For authentication details, please refer to the [Authentication Guide](/docs-authentication).

  ## API Endpoints

  ### Submit Task & Query Result

  ## Parameters

  ### Task Submission Parameters

  #### Request Parameters

  #### Response Parameters

  <SubmitResponse />

  #### Result Request Parameters

  | Parameter | Type | Required | Default | Description |
  |-----------|------|----------|---------|-------------|
  | id | string | Yes | - | Task ID |

  #### Result Response Parameters

  | Parameter | Type | Description |
  |-----------|------|-------------|
  | code | integer | HTTP status code (e.g., 200 for success) |
  | message | string | Status message (e.g., "success") |
  | data | object | The prediction data object containing all details |
  | data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
  | data.model | string | Model ID used for the prediction |
  | data.outputs | string | Array of generated audio URLs. |
  | data.urls | object | Object containing related API endpoints |
  | data.urls.get | string | URL to retrieve the prediction result |
  | data.status | string | Status of the task: `created`, `processing`, `completed`, or `failed` |
  | data.created_at | string | ISO timestamp of when the request was created (e.g., "2023-04-01T12:34:56.789Z") |
  | data.error | string | Error message (empty if no error occurred) |
  | data.timings | object | Object containing timing details |
  | data.timings.inference | integer | Inference time in milliseconds |

</ApiPage>

Stability AI Stable Audio 3 Music Stability AI Stable Diffusion