Vidu Contest
WaveSpeed.ai
Início/Explorar/Speech Generation/wavespeed-ai/mmaudio-v2
video-dubbing

video-dubbing

MMaudio V2

wavespeed-ai/mmaudio-v2

MMaudio v2 produces synchronized audio from video or text inputs, ideal for adding soundtracks to videos when paired with video models. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Input

Hint: You can drag and drop a file or click to upload

Whether to mask away the clip.

Idle

Sua solicitação custará $0.001 por execução.

Por $1 você pode executar este modelo aproximadamente 1000 vezes.

ExemplosVer todos

README

MMAudio v2 — wavespeed-ai/mmaudio-v2

MMAudio v2 generates high-quality sound effects and ambience for a video using the visual content plus a text prompt. Upload a clip, describe the audio you want (environment, materials, impacts, whooshes, texture), and the model synthesizes a synced audio track that matches motion and timing. It’s ideal for adding cinematic SFX, atmospheric layers, and “sound design” style audio to silent footage.

Key capabilities

  • Video-to-audio generation (adds sound to an existing video)
  • Prompt-driven sound design: ambience, impacts, textures, mechanical sounds, nature
  • Timing-aware audio that follows visual motion beats
  • Optional negative_prompt to avoid unwanted audio characteristics
  • Duration control for generating audio for different clip lengths
  • mask_away_clip option for generating audio without directly using the original clip audio

Use cases

  • Add cinematic ambience to silent clips (city night, wind, rain, room tone)
  • Create synced sound effects (footsteps, fabric rustle, metal clanks, sparks)
  • Product and food sound design (sizzles, pours, crackles, knife cuts)
  • Trailer-style audio layers for short edits and social videos
  • Rapid sound prototyping before final mix and mastering

Pricing

UnitPrice
Per second of audio$0.001

Examples:

DurationPrice
5s$0.005
8s$0.008
10s$0.010

Inputs

  • video (required): the source video to generate audio for
  • prompt (required): describe the desired sound

Parameters

  • duration: audio length in seconds
  • num_inference_steps: sampling steps
  • guidance_scale: prompt adherence strength
  • negative_prompt: what to avoid (e.g., “muffled, noisy, distorted, music”)
  • mask_away_clip: whether to mask away the clip (useful when you want fully generated audio)

Prompting guide (video → audio)

Write prompts like a sound designer:

  • Environment: location + ambience (rainy alley, factory hall, forest dawn)
  • Materials: metal, glass, lava, fabric, wood, water
  • Actions: slice, pour, crackle, hiss, whoosh, impact
  • Texture: crisp, gritty, low rumble, sparkling high-end, subtle room tone
  • Timing beats: “as the blade presses in…”, “when the cube hits the ground…”

Example prompts

  • A glowing lava cube crackles and pops with ember flickers. A tungsten blade presses into the semi-liquid core with a soft sizzling hiss, tiny molten droplets splatter, low rumble underneath, cinematic close-mic detail.
  • Rainy city night ambience with distant traffic, soft wind, occasional footsteps, subtle neon buzz, realistic stereo space.