Kling 2.6 Audio — Text-to-Video
Kling 2.6 Audio Text-to-Video turns a text prompt directly into a fully scored clip: camera motion, character action, and soundtrack (voice, ambience, SFX) are generated in one pass, so the scene looks and sounds like it belongs together.
🌟 Model Highlights
- Joint audio–video generation – Visuals and sound are created together, not bolted on after the fact.
- Character-aware voices – Speech that matches who’s on screen, with timing aligned to the action you describe.
- Scene-driven sound design – Ambient noise and effects that follow the camera and events in the shot.
- Script-to-scene pipeline – Start from a natural-language prompt; Kling handles shots, motion, and soundscape.
🧩 Parameters
-
prompt* – Describe what happens in the scene: characters, camera moves, environment, and audio mood
(e.g. “Close-up of a robot repairing a neon sign, soft synthwave music, quiet city ambience, no dialogue.”)
-
negative_prompt – Things to avoid in both visuals and audio (logo, watermark, heavy text, glitch, noise).
-
cfg_scale – Guidance strength (default 0.5):
- Lower → looser, more organic; model improvises more.
- Higher → closer to prompt wording; can look or sound more “forced”.
-
sound –
- On → generate video with audio (voice / ambience / SFX where appropriate).
- Off → silent video only (cheaper, same visuals).
-
duration – 5 s or 10 s clips.
🎯 Typical Use Cases
- Social ads or launch teasers with built-in narration and sound design.
- Short story beats, animatics, or previz where visual + audio timing must line up.
- Product explainers with spoken description + on-screen action.
- Cinematic posts and shorts where you want music, ambience, and motion from a single prompt.
💰 Pricing
| Mode | Length | Price |
|---|
| No Audio | 5 s | $0.35 |
| No Audio | 10 s | $0.70 |
| With Audio | 5 s | $0.70 |
| With Audio | 10 s | $1.40 |
🚀 How to Use
-
Write a prompt describing:
- what the camera sees (shots, motion, setting),
- what characters do,
- and, if sound is on, the voice tone, music style, and ambience/SFX you want.
-
(Optional) Add a negative_prompt for things you don’t want in either image or audio.
-
Tune cfg_scale (start from 0.5; increase only if it’s not following your prompt enough).
-
Toggle sound on/off depending on whether you need audio.
-
Run the model.
🔎 Tips
- Write prompts like a mini shot list + audio brief: who, where, camera, mood, and sound.
- For clearer narration, explicitly specify “single narrator”, voice gender/age, and language/accents.
- Use negative_prompt for “watermark, text, logo, glitch, noisy audio” to keep outputs clean.
- For platform export (Reels/Shorts/TikTok), pick 9:16; for YouTube/web, use 16:9; for feeds/ads, try 1:1.