Mirelo SFX v1.5 (Video-to-Sound)
Mirelo SFX v1.5 turns your videos into synchronized sound effects using advanced multimodal AI. It listens, sees, and imagines — automatically generating realistic or cinematic sound layers that perfectly match the visual rhythm. Whether it’s footsteps, explosions, or ambient noise, this model brings motion to life.
Why it sounds great
- AI-driven sound synthesis – Generates sound effects that fit object motion, timing, and energy directly from video frames.
- Cinematic awareness – Detects on-screen actions (impacts, motion, intensity) and produces corresponding effects.
- Multiple variations – Create multiple versions of the same video for creative control and sound design diversity.
- High coherence – Outputs seamlessly loopable audio segments aligned to scene transitions.
- Plug-and-play – Just upload a video clip, set samples, and receive ready-to-use sound effects.
Limits and Performance
- Max duration per job: up to 10 seconds (minimum billing covers 5 seconds)
- Processing speed: typically 6–12 seconds per generation
- Input: MP4, MOV, or URL video upload
- Output: AI-generated synchronized sound effects (WAV or MP3)
Pricing
| Duration range (seconds) | Billing rule | Approx. cost per second |
|---|
| 0–5 s | Minimum charge (5 s) | $0.007 × num_samples × 5 = $0.035 × num_samples |
| 5–10 s | Actual duration billed | $0.007 × num_samples × duration ≈ $0.007 × num_samples per second |
| >10 s | Capped at 10 s | $0.07 × num_samples maximum per run |
How to Use
- Upload a video (drag & drop or paste a URL).
- (Optional) Write a prompt to describe sound context (e.g., “soft footsteps on wood,” “metal clangs,” “cinematic ambience”).
- Set num_samples — the number of different sound versions to generate.
- (Optional) Fix seed for reproducibility or randomize for variation.
- Click Run — preview and download results.
Pro tips for best quality
- Use short, focused clips (≤10s) to maintain strong visual-sound alignment.
- For cinematic realism, include context in the prompt (e.g., “rainy street, distant thunder”).
- Generate multiple samples to audition variations before final mixdown.
- Adjust seed for subtle variations in timing and sound character.
Note
- Each sample is generated independently; total cost scales linearly with
num_samples.
- Minimum billing covers 5 seconds even for shorter clips.
- Works best with clear, high-contrast motion — busy scenes may mix sound layers automatically.