Mmaudio V2

Playground

MMaudio v2 produces synchronized audio from video or text inputs, ideal for adding soundtracks to videos when paired with video models. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

MMAudio v2 — wavespeed-ai/mmaudio-v2

MMAudio v2 generates high-quality sound effects and ambience for a video using the visual content plus a text prompt. Upload a clip, describe the audio you want (environment, materials, impacts, whooshes, texture), and the model synthesizes a synced audio track that matches motion and timing. It’s ideal for adding cinematic SFX, atmospheric layers, and “sound design” style audio to silent footage.

Key capabilities

Video-to-audio generation (adds sound to an existing video)
Prompt-driven sound design: ambience, impacts, textures, mechanical sounds, nature
Timing-aware audio that follows visual motion beats
Optional negative_prompt to avoid unwanted audio characteristics
Duration control for generating audio for different clip lengths
mask_away_clip option for generating audio without directly using the original clip audio

Use cases

Add cinematic ambience to silent clips (city night, wind, rain, room tone)
Create synced sound effects (footsteps, fabric rustle, metal clanks, sparks)
Product and food sound design (sizzles, pours, crackles, knife cuts)
Trailer-style audio layers for short edits and social videos
Rapid sound prototyping before final mix and mastering

Pricing

Unit	Price
Per second of audio	$0.001

Examples:

Duration	Price
5s	$0.005
8s	$0.008
10s	$0.010

Inputs

video (required): the source video to generate audio for
prompt (required): describe the desired sound

Parameters

duration: audio length in seconds
num_inference_steps: sampling steps
guidance_scale: prompt adherence strength
negative_prompt: what to avoid (e.g., “muffled, noisy, distorted, music”)
mask_away_clip: whether to mask away the clip (useful when you want fully generated audio)

Prompting guide (video → audio)

Write prompts like a sound designer:

Environment: location + ambience (rainy alley, factory hall, forest dawn)
Materials: metal, glass, lava, fabric, wood, water
Actions: slice, pour, crackle, hiss, whoosh, impact
Texture: crisp, gritty, low rumble, sparkling high-end, subtle room tone
Timing beats: “as the blade presses in…”, “when the cube hits the ground…”

Example prompts

A glowing lava cube crackles and pops with ember flickers. A tungsten blade presses into the semi-liquid core with a soft sizzling hiss, tiny molten droplets splatter, low rumble underneath, cinematic close-mic detail.
Rainy city night ambience with distant traffic, soft wind, occasional footsteps, subtle neon buzz, realistic stereo space.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/mmaudio-v2" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "num_inference_steps": 25,
    "duration": 8,
    "guidance_scale": 4.5,
    "mask_away_clip": false
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
prompt	string	Yes		-	The positive prompt for the generation.
video	string	Yes		-	The URL of the video to generate the audio for.
negative_prompt	string	No		-	The negative prompt for the generation.
num_inference_steps	integer	No	25	4 ~ 50	The number of inference steps to perform.
duration	integer	No	8	1 ~ 30	The duration of the generated media in seconds.
guidance_scale	number	No	4.5	0 ~ 20	The guidance scale to use for the generation.
mask_away_clip	boolean	No	false	-	Whether to mask away the clip.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction, the ID of the prediction to get
data.model	string	Model ID used for the prediction
data.outputs	string	Array of URLs to the generated content (empty when status is not completed).
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds