Kling Omni Video O1 (Standard) is Kuaishou's first unified multi-modal video model with MVL (Multi-modal Visual Language) technology. Text-to-Video mode generates cinematic videos from text prompts with subject consistency, natural physics simulation, and precise semantic understanding. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.
就緒
$0.42每次運行·~23 / $10
Kling Omni Video O1 is Kuaishou's unified multi-modal video generation model, optimized for stable production use and cost efficiency. The Text-to-Video mode transforms natural language prompts into high-quality videos with coherent motion, accurate semantic understanding, and consistent visual output.
The model supports multiple video generation and editing workflows within a single system:
The model interprets instructions through MVL, enabling understanding of:
Maintains stable characters, objects, and scene attributes across frames, ensuring reliable and repeatable results suitable for production workflows.
Example: "A young woman walking through a neon-lit Tokyo street at night, rain reflecting city lights, cinematic tracking shot"
Set Parameters Choose the desired duration, and aspect ratio.
Generate Submit the request and receive a coherent video generated from text.
| duration | price |
|---|---|
| 5s | $0.42 |
| 10s | $0.84 |
Billed based on the selected output duration. Pricing is optimized for standard production workloads.
kwaivgi/kling-video-o1-std — Video Edit — Edit videos with natural-language instructions for precise, context-aware changes like object removal, scene adjustments, and style refinement while preserving motion consistency.
kwaivgi/kling-video-o1-std — Reference to Video — Generate new videos guided by a reference video to match its style, identity, or motion patterns, ideal for consistent visual storytelling and content iteration.
kwaivgi/kling-video-o1-std — Image to Video — Animate a single image into a high-quality video clip with smooth motion and coherent scene continuity, perfect for marketing creatives and social content.