Mmaudio V2

Mmaudio V2

Playground

Try it on WavespeedAI!

MMaudio v2 produces synchronized audio from video or text inputs, ideal for adding soundtracks to videos when paired with video models. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

MMAudio v2 — wavespeed-ai/mmaudio-v2

MMAudio v2 generates high-quality sound effects and ambience for a video using the visual content plus a text prompt. Upload a clip, describe the audio you want (environment, materials, impacts, whooshes, texture), and the model synthesizes a synced audio track that matches motion and timing. It’s ideal for adding cinematic SFX, atmospheric layers, and “sound design” style audio to silent footage.

Key capabilities

  • Video-to-audio generation (adds sound to an existing video)
  • Prompt-driven sound design: ambience, impacts, textures, mechanical sounds, nature
  • Timing-aware audio that follows visual motion beats
  • Optional negative_prompt to avoid unwanted audio characteristics
  • Duration control for generating audio for different clip lengths
  • mask_away_clip option for generating audio without directly using the original clip audio

Use cases

  • Add cinematic ambience to silent clips (city night, wind, rain, room tone)
  • Create synced sound effects (footsteps, fabric rustle, metal clanks, sparks)
  • Product and food sound design (sizzles, pours, crackles, knife cuts)
  • Trailer-style audio layers for short edits and social videos
  • Rapid sound prototyping before final mix and mastering

Pricing

UnitPrice
Per second of audio$0.001

Examples:

DurationPrice
5s$0.005
8s$0.008
10s$0.010

Inputs

  • video (required): the source video to generate audio for
  • prompt (required): describe the desired sound

Parameters

  • duration: audio length in seconds
  • num_inference_steps: sampling steps
  • guidance_scale: prompt adherence strength
  • negative_prompt: what to avoid (e.g., “muffled, noisy, distorted, music”)
  • mask_away_clip: whether to mask away the clip (useful when you want fully generated audio)

Prompting guide (video → audio)

Write prompts like a sound designer:

  • Environment: location + ambience (rainy alley, factory hall, forest dawn)
  • Materials: metal, glass, lava, fabric, wood, water
  • Actions: slice, pour, crackle, hiss, whoosh, impact
  • Texture: crisp, gritty, low rumble, sparkling high-end, subtle room tone
  • Timing beats: “as the blade presses in…”, “when the cube hits the ground…”

Example prompts

  • A glowing lava cube crackles and pops with ember flickers. A tungsten blade presses into the semi-liquid core with a soft sizzling hiss, tiny molten droplets splatter, low rumble underneath, cinematic close-mic detail.
  • Rainy city night ambience with distant traffic, soft wind, occasional footsteps, subtle neon buzz, realistic stereo space.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/mmaudio-v2" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "num_inference_steps": 25,
    "duration": 8,
    "guidance_scale": 4.5,
    "mask_away_clip": false
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
videostringYes-The URL of the video to generate the audio for.
promptstringYes-The positive prompt for the generation.
negative_promptstringNo-The negative prompt for the generation.
num_inference_stepsintegerNo254 ~ 50The number of inference steps to perform.
durationintegerNo81 ~ 30The duration of the generated media in seconds.
guidance_scalenumberNo4.50 ~ 20The guidance scale to use for the generation.
mask_away_clipbooleanNofalse-Whether to mask away the clip.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

ParameterTypeRequiredDefaultDescription
idstringYes-Task ID

Result Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
dataobjectThe prediction data object containing all details
data.idstringUnique identifier for the prediction, the ID of the prediction to get
data.modelstringModel ID used for the prediction
data.outputsstringArray of URLs to the generated content (empty when status is not completed).
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds
© 2025 WaveSpeedAI. All rights reserved.