Stable Audio 3 Text-to-Audio is a fast AI audio generation model that creates sound effects from text prompts with controllable duration and output format. Ready-to-use REST inference API for sound effect generation, ambient audio, game audio, video production, cinematic sound design, creator content, and professional text-to-audio workflows with simple integration, no coldstarts, and affordable pricing.
Bereit
$0.0206pro Durchlauf·~48 / $1
30-second cinematic sound-design scene: an isolated desert gas station at midnight with buzzing fluorescent lights, distant highway wind, a loose metal sign creaking, insects around the lamps, and a faraway truck passing slowly. Keep it detailed, spatial, realistic, and non-musical.
Stability AI Stable Audio 3 Text-to-Audio generates audio directly from a natural-language prompt, with controls for duration, negative prompting, inference steps, guidance scale, and output format. It is suitable for sound design, ambient scenes, cinematic textures, audio prototyping, and other prompt-driven audio generation workflows.
Prompt-based audio generation
Generate audio from a text description of mood, environment, texture, or sound event.
Duration control
Choose how long the generated audio should be.
Negative prompt support
Use negative_prompt to steer the model away from unwanted elements.
Generation controls
Adjust num_inference_steps and guidance_scale to balance fidelity, control, and generation behavior.
Flexible output format
Export the generated audio in a supported format such as mp3.
Production-ready API
Suitable for sound effects, ambience, cinematic scenes, creative prototyping, and audio ideation workflows.
| Parameter | Required | Description |
|---|---|---|
| prompt | Yes | Text prompt describing the audio you want to generate. |
| duration | No | Output audio duration in seconds. |
| negative_prompt | No | Text description of sounds or qualities you want to avoid. |
| num_inference_steps | No | Number of inference steps used during generation. |
| guidance_scale | No | Controls how strongly the model follows the prompt. |
| output_format | No | Output audio format, such as mp3. |
num_inference_steps and guidance_scale if needed.30-second cinematic sound-design scene: an isolated desert gas station at midnight with buzzing fluorescent lights, distant highway wind, a loose metal sign creaking, insects around the lamps, and a faraway truck passing slowly. Keep it detailed, spatial, realistic, and non-musical.
Just $0.0206 per generation.
duration, negative_prompt, num_inference_steps, guidance_scale, and output_format do not affect pricingnegative_prompt when you want to suppress music, vocals, distortion, or unwanted artifacts.num_inference_steps if you want potentially more refined output and can tolerate more runtime.guidance_scale when you want tighter prompt adherence.prompt is required.Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/stability-ai/stable-audio-3/text-to-audio with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Stable Audio 3 Text To Audio below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/stability-ai/stable-audio-3/text-to-audio" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"duration": 30,
"negative_prompt": "blurry, low quality, distorted",
"num_inference_steps": 8,
"guidance_scale": 1,
"output_format": "mp3"
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("stability-ai/stable-audio-3/text-to-audio", {
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"duration": 30,
"negative_prompt": "blurry, low quality, distorted",
"num_inference_steps": 8,
"guidance_scale": 1,
"output_format": "mp3"
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"stability-ai/stable-audio-3/text-to-audio",
{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"duration": 30,
"negative_prompt": "blurry, low quality, distorted",
"num_inference_steps": 8,
"guidance_scale": 1,
"output_format": "mp3"
}
)
print(output["outputs"][0]) # → URL of the generated outputStable Audio 3 Text To Audio is a Stability AI model for audio generation, exposed as a REST API on WaveSpeedAI. Stable Audio 3 Text-to-Audio is a fast AI audio generation model that creates sound effects from text prompts with controllable duration and output format. Ready-to-use REST inference API for sound effect generation, ambient audio, game audio, video production, cinematic sound design, creator content, and professional text-to-audio workflows with simple integration, no coldstarts, and affordable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/stability-ai/stability-ai-stable-audio-3-text-to-audio.
Stable Audio 3 Text To Audio starts at $0.021 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `prompt`, `duration`, `guidance_scale`, `num_inference_steps`, `negative_prompt`, `output_format`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/stability-ai/stability-ai-stable-audio-3-text-to-audio.
Sign up for a free WaveSpeedAI account to claim starter credits, copy your API key from /accesskey, then call the endpoint shown in the API tab of the playground. The playground also auto-generates a code sample in Python, JavaScript, or cURL for the parameters you've set.
Commercial usage rights depend on the model's license, set by its provider (Stability AI). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.