Kling 2.6 Audio — Image-to-Video
Kling 2.6 Audio Image-to-Video adds audio–video co-generation to Kling’s strong visual pipeline. You start from a still image, write a prompt, and the model produces a short clip where motion, camera, sound effects and voice all feel like one coherent scene.
🌟 Model Highlights
- Audio + video in one pass – First Kling version that jointly generates visuals and soundtrack.
- Character-synced voices – Speech and reactions that match the on-screen subject and timing.
- Scene-aware sound design – Ambient noise and SFX that follow what happens in the frame.
- Image-driven motion – Uses your input image as the starting frame and builds motion from there.
🧩 Parameters
-
image* – Source frame to animate (URL or upload). Use a sharp, well-lit image.
-
prompt* – Describe scene motion and audio: camera moves, actions, voice style, ambience, SFX.
-
sound – Toggle audio–video co-generation on/off. When off, you get silent video only.
-
duration – Currently supports 5s and 10s clips.
-
negative_prompt – Things to avoid in both visuals and audio, e.g. watermark, logo, text, distortion.
-
cfg_scale – Guidance strength slider (default 0.5):
- Lower values → Looser, more natural motion, image has more influence.
- Higher values → Closer adherence to prompt wording, but can look more “forced”.
🎯 Typical Use Cases
- Launch / promo videos with native-sounding, character-synced voiceover.
- Storytelling shorts where camera, action and sound must feel perfectly integrated.
- Product explainers that need both clear visuals and natural narration.
- Cinematic social posts with immersive ambience and SFX built in.
💰 Pricing
| Mode | Length | Price |
|---|
| No Audio | 5 s | $0.35 |
| No Audio | 10 s | $0.70 |
| With Audio | 5 s | $0.70 |
| With Audio | 10 s | $1.40 |
🚀 How to Use
-
Upload the image you want to animate.
-
Write a prompt describing:
- how the camera should move,
- what the characters do,
- and, if with_audio is enabled, the voice tone and soundscape (e.g. “low, calm narrator, soft city ambience, subtle whooshes on cuts”).
-
(Optional) Add a negative_prompt for elements you don’t want (visual or audio).
-
Adjust cfg_scale: start from the default; increase only if the model is not following your prompt enough.
-
Choose duration (5s or 10s), then toggle sound as needed.
-
Click Run to generate; tweak prompt / cfg_scale / seed and re-run for alternates.
🔎 Tips
- Keep the image and prompt aligned – don’t describe a totally different scene from the uploaded frame.
- For strong lip-sync and performance, explicitly mention who is speaking and what kind of voice you want.
- Start with the default cfg_scale; push it up slowly if the motion or sound doesn’t match your description.
- Use negative_prompt to reduce logos, watermarks, heavy text, or unwanted artefacts in stylised shots.