WaveSpeed.ai
Início/Explorar/Kling Models/kwaivgi/kling-v2.6-pro/text-to-video
text-to-video

text-to-video

Kling 2.6 Pro

kwaivgi/kling-v2.6-pro/text-to-video

Kling 2.6 Pro delivers top-tier text-to-video generation with smooth motion, cinematic visuals, strong prompt adherence, and native audio for ready-to-share clips. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

Input
Whether sound is generated simultaneously when generating a video

Idle

Sua solicitação custará $0.35 por execução.

Por $10 você pode executar este modelo aproximadamente 28 vezes.

Mais uma coisa::

ExemplosVer todos

README

Kling 2.6 Audio — Text-to-Video

Kling 2.6 Audio Text-to-Video turns a text prompt directly into a fully scored clip: camera motion, character action, and soundtrack (voice, ambience, SFX) are generated in one pass, so the scene looks and sounds like it belongs together.

🌟 Model Highlights

  • Joint audio–video generation – Visuals and sound are created together, not bolted on after the fact.
  • Character-aware voices – Speech that matches who’s on screen, with timing aligned to the action you describe.
  • Scene-driven sound design – Ambient noise and effects that follow the camera and events in the shot.
  • Script-to-scene pipeline – Start from a natural-language prompt; Kling handles shots, motion, and soundscape.

🧩 Parameters

  • prompt* – Describe what happens in the scene: characters, camera moves, environment, and audio mood (e.g. “Close-up of a robot repairing a neon sign, soft synthwave music, quiet city ambience, no dialogue.”)

  • negative_prompt – Things to avoid in both visuals and audio (logo, watermark, heavy text, glitch, noise).

  • cfg_scale – Guidance strength (default 0.5):

    • Lower → looser, more organic; model improvises more.
    • Higher → closer to prompt wording; can look or sound more “forced”.
  • sound

    • On → generate video with audio (voice / ambience / SFX where appropriate).
    • Off → silent video only (cheaper, same visuals).
  • duration5 s or 10 s clips.

🎯 Typical Use Cases

  • Social ads or launch teasers with built-in narration and sound design.
  • Short story beats, animatics, or previz where visual + audio timing must line up.
  • Product explainers with spoken description + on-screen action.
  • Cinematic posts and shorts where you want music, ambience, and motion from a single prompt.

💰 Pricing

ModeLengthPrice
No Audio5 s$0.35
No Audio10 s$0.70
With Audio5 s$0.70
With Audio10 s$1.40

🚀 How to Use

  1. Write a prompt describing:

    • what the camera sees (shots, motion, setting),
    • what characters do,
    • and, if sound is on, the voice tone, music style, and ambience/SFX you want.
  2. (Optional) Add a negative_prompt for things you don’t want in either image or audio.

  3. Tune cfg_scale (start from 0.5; increase only if it’s not following your prompt enough).

  4. Toggle sound on/off depending on whether you need audio.

  5. Run the model.

🔎 Tips

  • Write prompts like a mini shot list + audio brief: who, where, camera, mood, and sound.
  • For clearer narration, explicitly specify “single narrator”, voice gender/age, and language/accents.
  • Use negative_prompt for “watermark, text, logo, glitch, noisy audio” to keep outputs clean.
  • For platform export (Reels/Shorts/TikTok), pick 9:16; for YouTube/web, use 16:9; for feeds/ads, try 1:1.