Mmaudio V2
Playground
Try it on WavespeedAI!MMAudio generates synchronized audio given video and/or text inputs. It can be combined with video models to get videos with audio.
Features
MMAudio Video-to-Audio Synthesis Model 🎵
A powerful video-to-audio synthesis model (based on MMAudio V2) that transforms visual content into rich, contextually appropriate audio. This model specializes in generating high-quality audio that matches the visual elements, actions, and environments in source videos while maintaining temporal consistency.
Implementation ✨
This Replicate deployment uses the MMAudio V2 model to provide advanced capabilities for video-to-audio synthesis, focusing on:
- High-fidelity audio generation matching visual content
- Real-time synchronization with video events
- Environmental sound synthesis
- Action-to-sound mapping
Model Description 🎧
The model employs the sophisticated deep learning architecture of MMAudio V2, designed specifically for video-to-audio synthesis. Using advanced neural networks and temporal analysis, it processes visual information to generate corresponding audio that naturally fits the content.
Key features:
- 🎵 High-quality audio synthesis from video
- 🎭 Context-aware sound generation
- ⏱️ Precise temporal synchronization
- 🌍 Rich environmental audio synthesis
- 🎯 Accurate action-sound mapping
- 🔄 Works with diverse video sources
Predictions Examples 🌟
The model excels at transformations like:
- Converting silent films to audio-enhanced versions
- Adding environmental sounds to nature videos
- Generating appropriate sound effects for action sequences
- Creating ambient audio for different settings
- Synthesizing speech-like sounds for speaking figures
Limitations ⚠️
- Processing time increases with video length
- Complex acoustic environments may require additional processing
- Output quality depends on input video clarity
- Some unique sound effects may need specialized handling
- Resource requirements scale with video complexity
- Performance varies with rapid scene changes
Applications 🎯
MMAudio provides valuable solutions for:
- Film and video post-production
- Silent film restoration
- Educational content enhancement
- Gaming and VR sound design
- Accessibility improvements
- Content creation and editing
Ethical Considerations 📝
Important points to consider:
- Respect original content rights
- Maintain transparency about AI-generated audio
- Consider potential misuse implications
- Provide appropriate attribution
- Follow content creation guidelines
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/mmaudio-v2" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"video": "https://d3gnftk2yhz9lr.wavespeed.ai/media/ec44bbf6abac4c25998dd2c4af1a46a7/videos/1744961424459636159_srROLJGD.mp4",
"prompt": "Indian holy music",
"negative_prompt": "",
"num_inference_steps": 25,
"duration": 8,
"guidance_scale": 4.5,
"mask_away_clip": false
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
Parameter | Type | Required | Default | Range | Description |
---|---|---|---|---|---|
video | string | Yes | - | The URL of the video to generate the audio for. | |
prompt | string | Yes | - | The prompt to generate the audio for. | |
negative_prompt | string | No | - | The negative prompt to generate the audio for. | |
num_inference_steps | integer | No | 25 | 4 ~ 50 | The number of inference steps to perform. |
duration | integer | No | 8 | 1 ~ 30 | The duration of the generated media in seconds. |
guidance_scale | number | No | 4.5 | 0 ~ 20 | The guidance scale to use for the generation. |
mask_away_clip | boolean | No | false | - | Whether to mask away the clip. |
Response Parameters
Parameter | Type | Description |
---|---|---|
code | integer | HTTP status code (e.g., 200 for success) |
message | string | Status message (e.g., “success”) |
data.id | string | Unique identifier for the prediction, Task Id |
data.model | string | Model ID used for the prediction |
data.outputs | array | Array of URLs to the generated content (empty when status is not completed ) |
data.urls | object | Object containing related API endpoints |
data.urls.get | string | URL to retrieve the prediction result |
data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
data.status | string | Status of the task: created , processing , completed , or failed |
data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
data.error | string | Error message (empty if no error occurred) |
data.timings | object | Object containing timing details |
data.timings.inference | integer | Inference time in milliseconds |
Result Query Parameters
Result Request Parameters
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
id | string | Yes | - | Task ID |
Result Response Parameters
Parameter | Type | Description |
---|---|---|
code | integer | HTTP status code (e.g., 200 for success) |
message | string | Status message (e.g., “success”) |
data | object | The prediction data object containing all details |
data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
data.model | string | Model ID used for the prediction |
data.outputs | array | Array of URLs to the generated content (empty when status is not completed ) |
data.urls | object | Object containing related API endpoints |
data.urls.get | string | URL to retrieve the prediction result |
data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
data.status | string | Status of the task: created , processing , completed , or failed |
data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
data.error | string | Error message (empty if no error occurred) |
data.timings | object | Object containing timing details |
data.timings.inference | integer | Inference time in milliseconds |