Home/Explore/Kling Video Models/kwaivgi/kling-video-to-audio

video-to-audio

kwaivgi/kling-video-to-audio

Generate sound effects from video using KlingAI's advanced audio generation model. Extract or generate matching audio tracks for your videos automatically.

Hint: You can drag and drop a file or click to upload

Enable ASMR mode to enhance detailed sound effects, suitable for immersive content scenarios

Idle

Your request will cost $0.035 per run.

For $1 you can run this model approximately 28 times.

ExamplesView all

README

Kuaivgi — Kling Video-to-Audio

Kling Video-to-Audio adds a complete soundtrack to a silent video using two short prompts: one for sound effects (SFX) and one for background music (BGM). It generates synchronized foley, ambience, and score cues that match on-screen action. Great for trailers, shorts, product shots, and mood pieces.

Highlights

  • Prompt-based SFX and BGM that follow scene energy and timing
  • Optional ASMR mode for hyper-detailed, close-mic textures
  • Works with cinematic, documentary, gameplay, and product footage
  • Fast iteration: tweak prompts, re-render, and compare

Parameters

  • video (required) URL or upload of the silent clip to be sonified.

  • sound_effect_prompt Describe on-screen events and textures to hear. Example: “Thunderstorm, heavy rain, distant thunder rolls, glass rattling, wind gusts, ocean waves slamming rocks.”

  • bgm_prompt Describe musical mood, instrumentation, and pacing. Example: “Brooding orchestral score, low strings, sparse piano hits, slow build with sub-bass swells.”

  • asmr_mode (checkbox) Enhances micro-details and proximity effect for immersive listening (ear-tingles, crisp foley).

How to Use

  1. Upload or paste the video URL.
  2. Write a concise sound_effect_prompt for foley/ambience.
  3. Add a bgm_prompt for the musical bed.
  4. Toggle asmr_mode if you want ultra-detailed textures.
  5. Click Run and download the generated audio track aligned to your clip.

Prompting Tips

  • Be concrete: call out specific events, materials, and distances

    “Leather jacket rustle, footsteps on wet concrete, elevator ding, neon hum.”

  • For BGM, specify tempo/structure.

  • Keep SFX and BGM prompts stylistically consistent to avoid clashes.

  • If dialogue is needed, add it in post—this model focuses on SFX and score.

Output

  • An audio track designed to sync with the input video’s duration.
  • Format and delivery follow platform defaults (download URL in the response).

Pricing

  • Per-job pricing is $0.035

Notes

  • Start with clean, final-cut footage; large edits after sound design will desync cues.
  • Loudness is unmastered by design—normalize or master in your editor to your target LUFS.
  • Ensure you have rights to the video content you upload and follow platform policies for generated audio.