Kling V3.0 Standard Image-to-Video
Kling V3.0 Standard Image-to-Video is Kuaishou's latest image-to-video generation model. Upload a reference image and describe the motion — the model generates cinematic video with optional synchronized sound, voice support, and start-to-end frame guidance.
Why Choose This?
-
Latest Kling generation
V3.0 delivers improved motion quality and visual fidelity over V2.6.
-
Start-end frame guidance
Optional end image for controlled transitions between two frames.
-
Sound generation
Optional synchronized sound effects generated alongside the video.
-
Voice list support
Add up to 2 custom voice entries for character dialogue.
-
CFG scale control
Fine-tune the balance between prompt adherence and creative freedom.
Parameters
| Parameter | Required | Description |
|---|
| prompt | No | Text description of the desired motion and action |
| negative_prompt | No | Elements to exclude from generation |
| image | Yes | Start frame image to animate (URL or upload) |
| end_image | No | End frame image for guided transitions |
| duration | No | Video length: 5 or 10 seconds (default: 5) |
| cfg_scale | No | Prompt adherence strength (default: 0.5) |
| sound | No | Generate synchronized sound (default: disabled) |
| voice_list | No | Custom voice entries, up to 2 (click "+ Add Item") |
How to Use
- Upload your image — provide the reference image to animate.
- Write your prompt (optional) — describe the motion, camera movement, and action.
- Upload end image (optional) — provide an end frame for guided transitions.
- Add negative prompt (optional) — specify what you want to avoid.
- Set duration — 5 seconds or 10 seconds.
- Adjust cfg_scale (optional) — higher for stricter prompt following, lower for more freedom.
- Enable sound (optional) — generate synchronized audio with the video.
- Add voices (optional) — add up to 2 voice entries for dialogue.
- Run — submit and download your video.
Pricing
| Duration | Sound Off | Sound On |
|---|
| 5s | $0.18 | $0.27 |
| 10s | $0.36 | $0.54 |
Billing Rules
- Sound multiplier: disabled = 1×, enabled = 1.5×
Best Use Cases
- Photo Animation — Bring portraits, landscapes, and product images to life.
- Scene Transitions — Use start and end frames for smooth visual transitions.
- Social Media Content — Create engaging videos with sound from still images.
- Marketing & Ads — Generate dynamic promotional videos from product photos.
- Storytelling — Animate scenes with synchronized audio and dialogue.
Pro Tips
- Use clear, descriptive prompts with specific motion details for best results.
- Add an end_image for controlled transitions between two visual states.
- Enable sound for a complete video experience with synchronized audio.
- Use negative prompts to avoid artifacts (e.g., "blurry, low quality, distorted").
- Lower cfg_scale for more creative variation, higher for strict prompt adherence.
- Use high-quality source images for better video results.
Notes
- Image is the only required field; prompt is optional but recommended.
- Duration options are 5 or 10 seconds only.
- Voice list supports a maximum of 2 entries.
- Sound generation increases cost by 1.5×.
- Ensure uploaded image URLs are publicly accessible.
Related Models
- Kling V3.0 Standard Text-to-Video — Generate video from text descriptions with V3.0 quality.
- Kling V2.6 Standard Image-to-Video — Previous generation image-to-video.
- Kling V2.6 Standard Text-to-Video — Previous generation text-to-video.