Mirelo SFX V1 Video-to-Audio
Mirelo SFX V1 Video-to-Audio generates synchronized sound effects directly from video input, with optional text prompt guidance to steer the audio content. Upload a video, describe the sound you want, and get realistic, scene-matched audio output — multiple samples per run for easy A/B selection.
Why Choose This?
-
Video-synchronized sound effects
Analyzes video content and generates audio that matches the on-screen action, environment, and pacing.
-
Text prompt guidance
Optionally describe the desired sound to steer the output — useful for ambiguous scenes or specific audio requirements.
-
Multiple sample generation
Generate several audio variations in one run and pick the best result without re-submitting.
-
Adjustable duration
Control how many seconds of audio to generate, up to 10 seconds per run.
-
Reproducible results
Use the seed parameter to lock in a specific output for exact reproduction.
Parameters
| Parameter | Required | Description |
|---|
| video | Yes | Input video to generate sound effects for (URL or file upload). |
| prompt | No | Text description to guide the style or content of the generated audio. |
| num_samples | No | Number of audio variations to generate per run. Default: 2. |
| duration | No | Length of audio to generate in seconds. Range: 2–10. Default: 5. |
| seed | No | Random seed for reproducible results. Use -1 for a random seed. |
How to Use
- Upload your video — provide the clip you want to generate sound for via URL or drag-and-drop.
- Write a prompt (optional) — describe the type of sound or atmosphere you want. Use the Prompt Enhancer for better results.
- Set num_samples — choose how many audio variations to generate in one run.
- Set duration — choose how many seconds of audio to generate (2–10 seconds).
- Set seed (optional) — fix the seed to reproduce a specific result, or leave at -1 for random.
- Submit — generate and download your synchronized audio.
Pricing
Billed at $0.007 per second per sample, with a minimum billable duration of 2 seconds and a maximum of 10 seconds.
| Duration | 1 Sample | 2 Samples | 4 Samples |
|---|
| 2s | $0.014 | $0.028 | $0.056 |
| 5s | $0.035 | $0.070 | $0.140 |
| 10s | $0.070 | $0.140 | $0.280 |
Billing Rules
- Rate: $0.007 per second per sample
- Minimum billable duration: 2 seconds
- Maximum billable duration: 10 seconds
- Total cost = billed duration × num_samples × $0.007
Best Use Cases
- Film & Video Post-Production — Add realistic ambient sound, foley, and environmental audio to silent or poorly recorded footage.
- Social Media Content — Enhance short-form videos with matching sound effects for higher engagement.
- Game & Interactive Media — Generate contextual sound effects from in-game or recorded footage.
- Advertising — Quickly produce polished audio for product videos and promotional clips.
- Content Automation — Batch-generate synchronized audio for high-volume video workflows via the REST API.
Pro Tips
- Leave the prompt empty to let the model infer audio purely from the video content.
- Use the prompt to override or refine the default interpretation — for example, specifying "rain on window glass" for an ambiguous indoor scene.
- Generate 3–4 samples per run to maximize variation and increase the chance of a perfect match.
- Fix the seed once you find a result you like to reproduce it exactly in future runs.
- Match duration to the key action window in your video for the most focused and accurate audio.
Notes
- video is the only required field; all other parameters are optional.
- Billable duration is clamped between 2 and 10 seconds regardless of actual video length.
- Ensure video URLs are publicly accessible if using a link rather than a direct upload.
Related Models
- Hunyuan Video Foley — Generate foley-style synchronized audio from video using Hunyuan's audio model.
- Mirelo SFX V1.5 Video-to-Audio — Next-generation SFX model with improved audio quality and synchronization.