Home/Explore/Best Video Tool/wavespeed-ai/hunyuan-video-foley

video-to-audio

wavespeed-ai/hunyuan-video-foley

Upload a video and provide a text description to generate realistic audio.

Hint: You can drag and drop a file or click to upload

Idle

Your request will cost $0.05 per run.

For $1 you can run this model approximately 20 times.

ExamplesView all

README

HunyuanVideo-Foley

What is HunyuanVideo-Foley?

HunyuanVideo-Foley is Tencent Hunyuan's video-to-audio model that synthesizes realistic Foley and ambient sound directly from video. It aligns on-screen actions and scene context to produce timing-accurate, high-quality audio tracks.

Why this?

Traditional audio generators struggle with generalization, semantic alignment, and clean quality. HunyuanVideo-Foley addresses these pain points head-on.

What it can do

  • Multi-scene synchronization – High-quality audio aligned to complex, fast-cut visuals.
  • Multi-modal balance – Blends visual cues with optional text prompts for intent-aware sound.
  • 48 kHz hi-fi output – Professional clarity with low noise and artifacts.
  • SOTA performance – Leading results in fidelity, sync, and semantic alignment benchmarks.

From short clips to cinematic cuts

Whether you’re polishing a social clip or finishing an animated short, HunyuanVideo-Foley can help with you.

Example (ASMR):

  • Silent video description: close-up of hands slicing fresh kiwi on a wooden board; crisp macro textures; soft natural light.
  • Text prompt: Generate realistic kiwi cutting and peeling sounds; gentle tapping; calm ASMR ambience.

Designed for

  • Post & Studios – Fast Foley passes for animatics, rough cuts, and indie films.
  • Creators & Social Teams – Auto-sound shorts/reels with consistent timing.
  • Education & Prototyping – Demonstrate AV alignment or test sound design ideas quickly.

How to Use (HunyuanVideo-Foley)

  1. Upload video (required) – Add the silent (or low-sound) clip you want to sound.

  2. Write prompt (optional) – Briefly describe the mood or key sounds, e.g.

    • Rainy street ambience, soft footsteps, distant cars.
    • Kitchen ASMR: chopping vegetables, sizzling pan.
  3. Set seed – use a fixed number to reproduce the same result; change it for variants.

  4. Run – Click Run (the button shows the cost).

  5. Review & iterate – If timing or tone isn't right, tweak the prompt or seed and run again.