Stability Ai Stable Audio 3 Text To Audio
Playground
Try it on WavespeedAI!Stable Audio 3 Text-to-Audio is a fast AI audio generation model that creates sound effects from text prompts with controllable duration and output format. Ready-to-use REST inference API for sound effect generation, ambient audio, game audio, video production, cinematic sound design, creator content, and professional text-to-audio workflows with simple integration, no coldstarts, and affordable pricing.
Features
Stability AI Stable Audio 3 Text-to-Audio
Stability AI Stable Audio 3 Text-to-Audio generates audio directly from a natural-language prompt, with controls for duration, negative prompting, inference steps, guidance scale, and output format. It is suitable for sound design, ambient scenes, cinematic textures, audio prototyping, and other prompt-driven audio generation workflows.
Why Choose This?
-
Prompt-based audio generation
Generate audio from a text description of mood, environment, texture, or sound event. -
Duration control
Choose how long the generated audio should be. -
Negative prompt support
Usenegative_promptto steer the model away from unwanted elements. -
Generation controls
Adjustnum_inference_stepsandguidance_scaleto balance fidelity, control, and generation behavior. -
Flexible output format
Export the generated audio in a supported format such asmp3. -
Production-ready API
Suitable for sound effects, ambience, cinematic scenes, creative prototyping, and audio ideation workflows.
Parameters
| Parameter | Required | Description |
|---|---|---|
| prompt | Yes | Text prompt describing the audio you want to generate. |
| duration | No | Output audio duration in seconds. |
| negative_prompt | No | Text description of sounds or qualities you want to avoid. |
| num_inference_steps | No | Number of inference steps used during generation. |
| guidance_scale | No | Controls how strongly the model follows the prompt. |
| output_format | No | Output audio format, such as mp3. |
How to Use
- Write your prompt — describe the sound scene, mood, texture, and pacing you want.
- Set duration (optional) — choose the target audio length.
- Add a negative prompt (optional) — specify anything you want the model to avoid.
- Adjust generation settings (optional) — tune
num_inference_stepsandguidance_scaleif needed. - Choose output format (optional) — select the format that best fits your workflow.
- Submit — run the model and download the generated audio.
Example Prompt
30-second cinematic sound-design scene: an isolated desert gas station at midnight with buzzing fluorescent lights, distant highway wind, a loose metal sign creaking, insects around the lamps, and a faraway truck passing slowly. Keep it detailed, spatial, realistic, and non-musical.
Pricing
Just $0.0206 per generation.
Billing Rules
- Each generation costs $0.0206
- Pricing is fixed per request
duration,negative_prompt,num_inference_steps,guidance_scale, andoutput_formatdo not affect pricing
Best Use Cases
- Sound design — Generate cinematic and environmental sound scenes from prompts.
- Ambient audio — Create non-musical background textures and atmosphere beds.
- Creative prototyping — Explore multiple sound directions quickly from text.
- Game and film audio ideation — Produce draft effects and scene ambience for early-stage workflows.
- Content production — Generate custom audio for videos, trailers, podcasts, or installations.
Pro Tips
- Be specific in your prompt about environment, motion, texture, and realism.
- Use
negative_promptwhen you want to suppress music, vocals, distortion, or unwanted artifacts. - Increase
num_inference_stepsif you want potentially more refined output and can tolerate more runtime. - Adjust
guidance_scalewhen you want tighter prompt adherence. - Start with a clear, focused prompt before adding extra details.
Notes
promptis required.- Pricing is fixed at $0.0206 per generation.
- Better prompts usually improve both realism and controllability.
- This workflow is especially useful for non-musical and cinematic audio generation.
Related Models
- Other Stability AI audio generation workflows — Useful when you need different quality, speed, or control trade-offs.
- Prompt-based sound generation models — Useful when you want alternate audio styles or generation behavior.
<ApiPage model={model}>
## Authentication
For authentication details, please refer to the [Authentication Guide](/docs-authentication).
## API Endpoints
### Submit Task & Query Result
## Parameters
### Task Submission Parameters
#### Request Parameters
#### Response Parameters
<SubmitResponse />
#### Result Request Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| id | string | Yes | - | Task ID |
#### Result Response Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., "success") |
| data | object | The prediction data object containing all details |
| data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
| data.model | string | Model ID used for the prediction |
| data.outputs | string | Array of generated audio URLs. |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: `created`, `processing`, `completed`, or `failed` |
| data.created_at | string | ISO timestamp of when the request was created (e.g., "2023-04-01T12:34:56.789Z") |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |
</ApiPage>