Home/Explore/Best Video Tool/mirelo-ai/sfx-v1.5/video-to-video
video-to-video

video-to-video

Mirelo SFX V1.5

mirelo-ai/sfx-v1.5/video-to-video

Mirelo SFX V1.5 generates synchronized sound effects and audio for any video, producing synced SFX to enhance visuals. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Hint: You can drag and drop a file or click to upload

Idle

Your request will cost $0.007 per run.

For $1 you can run this model approximately 142 times.

One more thing::

ExamplesView all

README

Mirelo SFX v1.5 (Video-to-Sound)

Mirelo SFX v1.5 turns your videos into synchronized sound effects using advanced multimodal AI. It listens, sees, and imagines — automatically generating realistic or cinematic sound layers that perfectly match the visual rhythm. Whether it’s footsteps, explosions, or ambient noise, this model brings motion to life.

Why it sounds great

  • AI-driven sound synthesis – Generates sound effects that fit object motion, timing, and energy directly from video frames.
  • Cinematic awareness – Detects on-screen actions (impacts, motion, intensity) and produces corresponding effects.
  • Multiple variations – Create multiple versions of the same video for creative control and sound design diversity.
  • High coherence – Outputs seamlessly loopable audio segments aligned to scene transitions.
  • Plug-and-play – Just upload a video clip, set samples, and receive ready-to-use sound effects.

Limits and Performance

  • Max duration per job: up to 10 seconds (minimum billing covers 5 seconds)
  • Processing speed: typically 6–12 seconds per generation
  • Input: MP4, MOV, or URL video upload
  • Output: AI-generated synchronized sound effects (WAV or MP3)

Pricing

Duration range (seconds)Billing ruleApprox. cost per second
0–5 sMinimum charge (5 s)$0.007 × num_samples × 5 = $0.035 × num_samples
5–10 sActual duration billed$0.007 × num_samples × duration ≈ $0.007 × num_samples per second
>10 sCapped at 10 s$0.07 × num_samples maximum per run

How to Use

  1. Upload a video (drag & drop or paste a URL).
  2. (Optional) Write a prompt to describe sound context (e.g., “soft footsteps on wood,” “metal clangs,” “cinematic ambience”).
  3. Set num_samples — the number of different sound versions to generate.
  4. (Optional) Fix seed for reproducibility or randomize for variation.
  5. Click Run — preview and download results.

Pro tips for best quality

  • Use short, focused clips (≤10s) to maintain strong visual-sound alignment.
  • For cinematic realism, include context in the prompt (e.g., “rainy street, distant thunder”).
  • Generate multiple samples to audition variations before final mixdown.
  • Adjust seed for subtle variations in timing and sound character.

Note

  • Each sample is generated independently; total cost scales linearly with num_samples.
  • Minimum billing covers 5 seconds even for shorter clips.
  • Works best with clear, high-contrast motion — busy scenes may mix sound layers automatically.