Mmaudio V2
Playground
Try it on WavespeedAI!MMaudio v2 produces synchronized audio from video or text inputs, ideal for adding soundtracks to videos when paired with video models. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Features
MMAudio v2 — wavespeed-ai/mmaudio-v2
MMAudio v2 generates high-quality sound effects and ambience for a video using the visual content plus a text prompt. Upload a clip, describe the audio you want (environment, materials, impacts, whooshes, texture), and the model synthesizes a synced audio track that matches motion and timing. It’s ideal for adding cinematic SFX, atmospheric layers, and “sound design” style audio to silent footage.
Key capabilities
- Video-to-audio generation (adds sound to an existing video)
- Prompt-driven sound design: ambience, impacts, textures, mechanical sounds, nature
- Timing-aware audio that follows visual motion beats
- Optional negative_prompt to avoid unwanted audio characteristics
- Duration control for generating audio for different clip lengths
- mask_away_clip option for generating audio without directly using the original clip audio
Use cases
- Add cinematic ambience to silent clips (city night, wind, rain, room tone)
- Create synced sound effects (footsteps, fabric rustle, metal clanks, sparks)
- Product and food sound design (sizzles, pours, crackles, knife cuts)
- Trailer-style audio layers for short edits and social videos
- Rapid sound prototyping before final mix and mastering
Pricing
| Unit | Price |
|---|---|
| Per second of audio | $0.001 |
Examples:
| Duration | Price |
|---|---|
| 5s | $0.005 |
| 8s | $0.008 |
| 10s | $0.010 |
Inputs
- video (required): the source video to generate audio for
- prompt (required): describe the desired sound
Parameters
- duration: audio length in seconds
- num_inference_steps: sampling steps
- guidance_scale: prompt adherence strength
- negative_prompt: what to avoid (e.g., “muffled, noisy, distorted, music”)
- mask_away_clip: whether to mask away the clip (useful when you want fully generated audio)
Prompting guide (video → audio)
Write prompts like a sound designer:
- Environment: location + ambience (rainy alley, factory hall, forest dawn)
- Materials: metal, glass, lava, fabric, wood, water
- Actions: slice, pour, crackle, hiss, whoosh, impact
- Texture: crisp, gritty, low rumble, sparkling high-end, subtle room tone
- Timing beats: “as the blade presses in…”, “when the cube hits the ground…”
Example prompts
- A glowing lava cube crackles and pops with ember flickers. A tungsten blade presses into the semi-liquid core with a soft sizzling hiss, tiny molten droplets splatter, low rumble underneath, cinematic close-mic detail.
- Rainy city night ambience with distant traffic, soft wind, occasional footsteps, subtle neon buzz, realistic stereo space.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/mmaudio-v2" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"num_inference_steps": 25,
"duration": 8,
"guidance_scale": 4.5,
"mask_away_clip": false
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
| video | string | Yes | - | The URL of the video to generate the audio for. | |
| prompt | string | Yes | - | The positive prompt for the generation. | |
| negative_prompt | string | No | - | The negative prompt for the generation. | |
| num_inference_steps | integer | No | 25 | 4 ~ 50 | The number of inference steps to perform. |
| duration | integer | No | 8 | 1 ~ 30 | The duration of the generated media in seconds. |
| guidance_scale | number | No | 4.5 | 0 ~ 20 | The guidance scale to use for the generation. |
| mask_away_clip | boolean | No | false | - | Whether to mask away the clip. |
Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data.id | string | Unique identifier for the prediction, Task Id |
| data.model | string | Model ID used for the prediction |
| data.outputs | array | Array of URLs to the generated content (empty when status is not completed) |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |
Result Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| id | string | Yes | - | Task ID |
Result Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data | object | The prediction data object containing all details |
| data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
| data.model | string | Model ID used for the prediction |
| data.outputs | string | Array of URLs to the generated content (empty when status is not completed). |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |