WaveSpeed.ai
Início/Explorar/Kling Models/kwaivgi/kling-v2.6-pro/image-to-video
image-to-video

image-to-video

Kling 2.6 Pro

kwaivgi/kling-v2.6-pro/image-to-video

Kling 2.6 Pro delivers top-tier image-to-video generation with smooth motion, cinematic visuals, accurate prompt adherence, and native audio for ready-to-share clips. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

Input

Hint: You can drag and drop a file or click to upload

preview
Whether sound is generated simultaneously when generating a video

Idle

Sua solicitação custará $0.35 por execução.

Por $10 você pode executar este modelo aproximadamente 28 vezes.

Mais uma coisa::

ExemplosVer todos

README

Kling 2.6 Audio — Image-to-Video

Kling 2.6 Audio Image-to-Video adds audio–video co-generation to Kling’s strong visual pipeline. You start from a still image, write a prompt, and the model produces a short clip where motion, camera, sound effects and voice all feel like one coherent scene.

🌟 Model Highlights

  • Audio + video in one pass – First Kling version that jointly generates visuals and soundtrack.
  • Character-synced voices – Speech and reactions that match the on-screen subject and timing.
  • Scene-aware sound design – Ambient noise and SFX that follow what happens in the frame.
  • Image-driven motion – Uses your input image as the starting frame and builds motion from there.

🧩 Parameters

  • image* – Source frame to animate (URL or upload). Use a sharp, well-lit image.

  • prompt* – Describe scene motion and audio: camera moves, actions, voice style, ambience, SFX.

  • sound – Toggle audio–video co-generation on/off. When off, you get silent video only.

  • duration – Currently supports 5s and 10s clips.

  • negative_prompt – Things to avoid in both visuals and audio, e.g. watermark, logo, text, distortion.

  • cfg_scale – Guidance strength slider (default 0.5):

    • Lower values → Looser, more natural motion, image has more influence.
    • Higher values → Closer adherence to prompt wording, but can look more “forced”.

🎯 Typical Use Cases

  • Launch / promo videos with native-sounding, character-synced voiceover.
  • Storytelling shorts where camera, action and sound must feel perfectly integrated.
  • Product explainers that need both clear visuals and natural narration.
  • Cinematic social posts with immersive ambience and SFX built in.

💰 Pricing

ModeLengthPrice
No Audio5 s$0.35
No Audio10 s$0.70
With Audio5 s$0.70
With Audio10 s$1.40

🚀 How to Use

  1. Upload the image you want to animate.

  2. Write a prompt describing:

    • how the camera should move,
    • what the characters do,
    • and, if with_audio is enabled, the voice tone and soundscape (e.g. “low, calm narrator, soft city ambience, subtle whooshes on cuts”).
  3. (Optional) Add a negative_prompt for elements you don’t want (visual or audio).

  4. Adjust cfg_scale: start from the default; increase only if the model is not following your prompt enough.

  5. Choose duration (5s or 10s), then toggle sound as needed.

  6. Click Run to generate; tweak prompt / cfg_scale / seed and re-run for alternates.

🔎 Tips

  • Keep the image and prompt aligned – don’t describe a totally different scene from the uploaded frame.
  • For strong lip-sync and performance, explicitly mention who is speaking and what kind of voice you want.
  • Start with the default cfg_scale; push it up slowly if the motion or sound doesn’t match your description.
  • Use negative_prompt to reduce logos, watermarks, heavy text, or unwanted artefacts in stylised shots.