Kling Omni Video O1 Image-to-Video (Standard) turns static images into dynamic, high-quality videos while preserving subject identity and visual/temporal consistency. It adds natural motion, realistic physics, and smooth scene dynamics, and supports flexible clip durations when reference frames are provided. Built for stable production use and cost efficiency with a ready-to-use REST API, fast response, no cold starts, and predictable pricing.
Idle
$0.42per run·~23 / $10
The camera slowly pushes in as she sketches, the pencil making faint scratching sounds beneath the calls of distant gulls.
The woman walks down the lantern-lit alley, rain tapping gently on her umbrella as she approaches the café entrance. She steps inside, closing the umbrella beside her, and walks to the counter. After receiving the cup, she carries it to a window seat, sits down, and wraps her hands around the warm drink while watching the rain outside.
The hands slowly peel away the transparent wrapping with a gentle crinkling sound, then lift the lid to reveal a shiny new device nestled in foam.
The knife slices smoothly through the orange
A child floats through a dream, surrounded by glowing dandelions and talking animals, against a backdrop of ever-shifting starry skies and candy-colored clouds.
A semi-mechanical, semi-biological sea creature swims in the deep ocean, its body composed of glowing circuits and mechanical bones, surrounded by the ruins of a submerged futuristic city.
Kling Omni Video O1 (Standard) is Kuaishou's unified multi-modal video generation model, optimized for cost efficiency and stable production use. The Text-to-Video mode transforms natural language prompts into high-quality videos with coherent motion, strong scene understanding, and cinematic results.
Generate videos directly from text descriptions:
Advanced video reasoning ensures:
Use descriptive prompts to control:
Text-Driven Video Synthesis — From prompt to video in one step
Temporal Consistency — Stable visuals across the entire sequence
Cinematic Motion — Natural movement and camera dynamics
Standard Optimization — Balanced quality, speed, and cost
Adaptive Duration Control — Video length adapts based on input conditions
When last_image is provided, supports flexible durations from 3 to 10 seconds
Without last_image, generation is limited to 5s or 10s for optimal stability
Enter Your Text Prompt Describe the scene, subject, and actions in natural language.
Refine with Details (Optional) Add style, camera motion, or environment cues.
Example: "A futuristic city at night, neon lights reflecting on wet streets, slow cinematic camera pan"
Set Parameters Choose video duration and whether to use start and end frames for generation.
Generate Receive a coherent, dynamic video generated entirely from text.
| duration | price |
|---|---|
| per second | $0.084 |
kwaivgi/kling-video-o1-std — Video Edit — Edit videos with natural-language instructions for precise, context-aware changes like object removal, scene adjustments, and style refinement while preserving motion consistency.
kwaivgi/kling-video-o1-std — Reference to Video — Generate new videos guided by a reference video to match its style, identity, or motion patterns, ideal for consistent visual storytelling and content iteration.
kwaivgi/kling-video-o1-std — Text to Video — Create videos directly from text prompts with strong prompt adherence and cinematic motion, great for rapid prototyping, ads, and creative concept exploration.
Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/kwaivgi/kling-video-o1-std/image-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Kling Video O1 Std Image To Video below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/kwaivgi/kling-video-o1-std/image-to-video" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"image": "https://example.com/your-input.jpg",
"duration": 5
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("kwaivgi/kling-video-o1-std/image-to-video", {
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"image": "https://example.com/your-input.jpg",
"duration": 5
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"kwaivgi/kling-video-o1-std/image-to-video",
{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"image": "https://example.com/your-input.jpg",
"duration": 5
}
)
print(output["outputs"][0]) # → URL of the generated outputKling Video O1 Std Image To Video is a Kuaishou model for video generation from images, exposed as a REST API on WaveSpeedAI. Kling Omni Video O1 Image-to-Video (Standard) turns static images into dynamic, high-quality videos while preserving subject identity and visual/temporal consistency. It adds natural motion, realistic physics, and smooth scene dynamics, and supports flexible clip durations when reference frames are provided. Built for stable production use and cost efficiency with a ready-to-use REST API, fast response, no cold starts, and predictable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/kwaivgi/kwaivgi-kling-video-o1-std-image-to-video.
Kling Video O1 Std Image To Video starts at $0.42 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `prompt`, `image`, `duration`, `last_image`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/kwaivgi/kwaivgi-kling-video-o1-std-image-to-video.
Average end-to-end generation time on WaveSpeedAI is around 51 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.
Commercial usage rights depend on the model's license, set by its provider (Kuaishou). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.