Kling 2.6 Pro delivers top-tier image-to-video generation with smooth motion, cinematic visuals, accurate prompt adherence, and native audio for ready-to-share clips. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
Idle
$0.35per run·~28 / $10
Use the uploaded sci-fi alley image as the first frame. Keep the same alley, neon signs, reflections and the hooded woman walking away. Slowly move the camera forward down the alley behind her, like a tracking shot, with smooth, cinematic motion and slight handheld feeling. Let the rain keep falling, with droplets visible in the light beams and more ripples appearing in the puddles as the camera advances. Occasionally, one neon sign flickers and a distant train light passes across the sky between the buildings. Style: realistic cyberpunk night scene, rich colors, deep contrast, subtle lens bloom on the neon. Audio: ambient city noise with distant traffic and voices, soft electronic music pulse, loudest near the middle of the clip, no dialogue.
Scene: A dimly lit casino VIP room, with a green felt poker table at the center and a haze of drifting cigarette smoke surrounding the space. Subject: A suited man leans forward with his elbow on the table and says: "Three rounds to decide. Win, and all the chips are yours. Lose, and tell me the real reason you're getting close to him." Across from him, a curly-haired woman gently slides her fingertips along the edge of the table, her red lips curling slightly as she replies: "I don't care about the chips." Atmosphere is tense, cinematic, with dramatic low-key lighting and noir-style mood.
Scene: No visible people. Only a white robotic vacuum cleaner is shown along with its cleaning path on the floor. Audio: A soft female narrator speaks, accompanied by gentle vacuum-cleaning sound effects: "Still struggling with dust in the corners? This robotic vacuum cleans right up against the edges with no gaps, making your life easier and worry-free!" Camera: Follows the robot's cleaning path smoothly as it moves across the floor.
Scene: A tabletop setup featuring ASMR trigger props such as a crystal glass, wooden block, and makeup brushes. Audio: Soft "shhh—shhh" brushing sounds as a makeup brush gently sweeps across the crystal glass and wooden block. Camera: Focuses closely on the props and the precise hand movements, highlighting textures and subtle details. Atmosphere: Calm, soothing, and sensory-focused.
Scene: On a beach with sunlight spilling across golden sand, waves crashing onto the shore and forming white foam. Subject: A young American male wearing a backwards baseball cap, holding a camera for a selfie, smiling naturally. Audio: The young American male with a bright, sunny voice speaks to the camera: "The weather is amazing today! All my worries feel totally gone. I've been needing a day like this—sun, breeze, just the sound of the waves." Background includes layered ocean wave sounds, filmed in a close-up vlog-style shot.
On a rainy night street with neon lights flashing, the streetlights illuminate the wet ground as raindrops fall. A cellist stands under the streetlight, with raindrops dripping from their hair, playing the cello.The slow and affectionate solo melody of the cello , with a cold color tone.
Add a robot to the uploaded image. Then the robot walks up to the two birthday celebrants and says "Happy Birthday to You!" with its mouth movements perfectly synchronized to the words.
Kling 2.6 Pro Image-to-Video adds audio-video co-generation to Kling's powerful visual pipeline. Start from a still image, write a prompt, and the model produces a short clip where motion, camera, sound effects, and voice all feel like one coherent scene.
Audio and video in one pass Jointly generates visuals and soundtrack — no post-production audio sync needed.
Character-synced voices Speech and reactions that match the on-screen subject and timing.
Scene-aware sound design Ambient noise and SFX that follow what happens in the frame.
Start and end frame support Use both a starting image and optional ending image to guide the animation.
Voice customization Add custom voices via voice_list for character-specific audio.
Prompt Enhancer Built-in tool to automatically improve your prompts for better results.
| Parameter | Required | Description |
|---|---|---|
| prompt | Yes | Describe scene motion, camera moves, and audio |
| image | Yes | Starting frame to animate (upload or URL) |
| negative_prompt | No | Elements to avoid in visuals and audio |
| end_image | No | Ending frame to guide the animation target |
| cfg_scale | No | Guidance strength (default: 0.5) |
| sound | No | Enable audio-video co-generation (default: true) |
| voice_list | No | Custom voices for character audio |
| duration | No | Video length: 5 or 10 seconds |
| Duration | Sound Off | Sound On |
|---|---|---|
| 5s | $0.35 | $0.70 |
| 10s | $0.70 | $1.40 |
Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/kwaivgi/kling-v2.6-pro/image-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Kling v2.6 Pro Image To Video below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/kwaivgi/kling-v2.6-pro/image-to-video" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"image": "https://example.com/your-input.jpg",
"negative_prompt": "blurry, low quality, distorted",
"cfg_scale": 0.5,
"sound": false,
"duration": 5
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("kwaivgi/kling-v2.6-pro/image-to-video", {
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"image": "https://example.com/your-input.jpg",
"negative_prompt": "blurry, low quality, distorted",
"cfg_scale": 0.5,
"sound": false,
"duration": 5
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"kwaivgi/kling-v2.6-pro/image-to-video",
{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"image": "https://example.com/your-input.jpg",
"negative_prompt": "blurry, low quality, distorted",
"cfg_scale": 0.5,
"sound": false,
"duration": 5
}
)
print(output["outputs"][0]) # → URL of the generated outputKling v2.6 Pro Image To Video is a Kuaishou model for video generation from images, exposed as a REST API on WaveSpeedAI. Kling 2.6 Pro delivers top-tier image-to-video generation with smooth motion, cinematic visuals, accurate prompt adherence, and native audio for ready-to-share clips. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/kwaivgi/kwaivgi-kling-v2.6-pro-image-to-video.
Kling v2.6 Pro Image To Video starts at $0.35 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `prompt`, `image`, `duration`, `negative_prompt`, `cfg_scale`, `end_image`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/kwaivgi/kwaivgi-kling-v2.6-pro-image-to-video.
Average end-to-end generation time on WaveSpeedAI is around 87 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.
Commercial usage rights depend on the model's license, set by its provider (Kuaishou). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.