Kling Video O1 Std Image to Video | Fast Image-to-Video API

Kling Omni Video O1 - Text-to-Video (Standard)

Kling Omni Video O1 (Standard) is Kuaishou's unified multi-modal video generation model, optimized for cost efficiency and stable production use. The Text-to-Video mode transforms natural language prompts into high-quality videos with coherent motion, strong scene understanding, and cinematic results.

Key Capabilities

Intelligent Text-to-Video Generation

Generate videos directly from text descriptions:

Converts natural language into dynamic video scenes
Understands actions, environments, and visual styles
Produces smooth, temporally consistent motion

Scene and Motion Coherence

Advanced video reasoning ensures:

Logical object interactions and movement flow
Stable scene structure across frames
Consistent lighting, colors, and atmosphere

Multi-Modal Prompt Understanding

Use descriptive prompts to control:

Subject appearance and actions
Camera movement and framing
Mood, style, and scene dynamics

Core Features

Text-Driven Video Synthesis — From prompt to video in one step
Temporal Consistency — Stable visuals across the entire sequence
Cinematic Motion — Natural movement and camera dynamics
Standard Optimization — Balanced quality, speed, and cost
Adaptive Duration Control — Video length adapts based on input conditions
When last_image is provided, supports flexible durations from 3 to 10 seconds
Without last_image, generation is limited to 5s or 10s for optimal stability

How to Use

Enter Your Text Prompt Describe the scene, subject, and actions in natural language.
Refine with Details (Optional) Add style, camera motion, or environment cues.

Example: "A futuristic city at night, neon lights reflecting on wet streets, slow cinematic camera pan"

Set Parameters Choose video duration and whether to use start and end frames for generation.
Generate Receive a coherent, dynamic video generated entirely from text.

Pricing

duration	price
per second	$0.084

Pro Tips

Use clear, descriptive prompts for best results
Specify actions and camera movement for more dynamic videos
Combine environment and motion details for cinematic quality
Ideal for large-scale generation and cost-sensitive use cases

Kling O1 series models

kwaivgi/kling-video-o1-std — Video Edit — Edit videos with natural-language instructions for precise, context-aware changes like object removal, scene adjustments, and style refinement while preserving motion consistency.
kwaivgi/kling-video-o1-std — Reference to Video — Generate new videos guided by a reference video to match its style, identity, or motion patterns, ideal for consistent visual storytelling and content iteration.
kwaivgi/kling-video-o1-std — Text to Video — Create videos directly from text prompts with strong prompt adherence and cinematic motion, great for rapid prototyping, ads, and creative concept exploration.

Kling Video O1 Std Image To Video API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/kwaivgi/kling-video-o1-std/image-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Kling Video O1 Std Image To Video below.

HTTP example

# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/kwaivgi/kling-video-o1-std/image-to-video" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "image": "https://example.com/your-input.jpg",
    "duration": 5
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].

Node.js example

// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("kwaivgi/kling-video-o1-std/image-to-video", {
        "prompt": "A cinematic shot of a city at sunset, soft golden light",
        "image": "https://example.com/your-input.jpg",
        "duration": 5
});

console.log(result.outputs[0]); // → URL of the generated output

Python example

# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "kwaivgi/kling-video-o1-std/image-to-video",
    {
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "image": "https://example.com/your-input.jpg",
    "duration": 5
}
)

print(output["outputs"][0])  # → URL of the generated output

Kling Video O1 Std Image To Video API — Frequently asked questions

What is the Kling Video O1 Std Image To Video API?

Kling Video O1 Std Image To Video is a Kuaishou model for video generation from images, exposed as a REST API on WaveSpeedAI. Kling Omni Video O1 Image-to-Video (Standard) turns static images into dynamic, high-quality videos while preserving subject identity and visual/temporal consistency. It adds natural motion, realistic physics, and smooth scene dynamics, and supports flexible clip durations when reference frames are provided. Built for stable production use and cost efficiency with a ready-to-use REST API, fast response, no cold starts, and predictable pricing. You can call it programmatically or try it from the playground above.

How do I call the Kling Video O1 Std Image To Video API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/kwaivgi/kwaivgi-kling-video-o1-std-image-to-video.

How much does Kling Video O1 Std Image To Video cost per run?

Kling Video O1 Std Image To Video starts at $0.42 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Kling Video O1 Std Image To Video accept?

Key inputs: `prompt`, `image`, `duration`, `last_image`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/kwaivgi/kwaivgi-kling-video-o1-std-image-to-video.

How long does Kling Video O1 Std Image To Video take to generate?

Average end-to-end generation time on WaveSpeedAI is around 51 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.

Can I use Kling Video O1 Std Image To Video outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (Kuaishou). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.

ExamplesView all

Related Models

README