Kling Omni O3 API
Kuaishou Kling Omni Video O3 — advanced unified multi-modal video model with MVL (Multi-modal Visual Language) technology. Standard, Pro, and 4K tiers for text-to-video, image-to-video, reference-to-video, and conversational video-edit.
Standard (kwaivgi/kling-video-o3-std/*), Pro (kwaivgi/kling-video-o3-pro/*), and 4K (kwaivgi/kling-video-o3-4k/*) tiers. MVL technology maintains subject consistency across modalities. Video-edit accepts natural-language commands to remove objects, swap backgrounds, restyle scenes, and apply localized 3-10s transforms.
About the Kling Omni O3 API
What Kling Omni O3 does, how it fits in the Kuaishou model lineup, and why teams reach for it.
Kling Omni O3 is a video generation model from Kuaishou, available through the WaveSpeedAI REST API. Kuaishou Kling Omni Video O3 — advanced unified multi-modal video model with MVL (Multi-modal Visual Language) technology. Standard, Pro, and 4K tiers for text-to-video, image-to-video, reference-to-video, and conversational video-edit.
Standard (kwaivgi/kling-video-o3-std/*), Pro (kwaivgi/kling-video-o3-pro/*), and 4K (kwaivgi/kling-video-o3-4k/*) tiers. MVL technology maintains subject consistency across modalities. Video-edit accepts natural-language commands to remove objects, swap backgrounds, restyle scenes, and apply localized 3-10s transforms.
The Kling Omni O3 family on WaveSpeedAI ships 11 REST endpoints covering Video-To-Video, Text-To-Video, Image-To-Video workflows. Each variant carries its own pricing, parameter knobs, and example outputs — pick the one that matches your input modality and production constraints, or call several from the same API key to compose multi-step pipelines.
Run Kling Omni O3 through the same API key, billing account, and rate-limit envelope you use for the other 1,000+ AI models on WaveSpeedAI. No separate vendor setup, no per-provider SDKs, no per-vendor rate-limit envelopes — one integration covers everything from text-to-image and text-to-video through audio synthesis, 3D generation, upscaling, and editing.
All Kling Omni O3 API endpoints
11 endpoints available now on WaveSpeedAI — pick the variant that matches your workflow.

Video Edit
Kling Omni Video O3 Video-Edit (Standard) enables natural-language video edits: remove or replace objects, swap backgrounds, restyle scenes, change weather/lighting, and apply localized 3-10s transformations with strong temporal consistency. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.

Video Edit
Kling Omni Video O3 Video-Edit enables conversational video editing through natural language commands. Remove objects, change backgrounds, modify styles, adjust weather/lighting, and transform scenes with simple text instructions like 'remove pedestrians' or 'change daytime to dusk'. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

Text To Video
Kling Omni Video O3 (Standard) is Kuaishou's advanced unified multi-modal video model with MVL (Multi-modal Visual Language) technology. Text-to-Video mode generates cinematic videos from text prompts with subject consistency, natural physics simulation, and precise semantic understanding. Supports audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

Text To Video
Kling Video O3 4K generates cinematic 4K videos from text prompts with subject consistency, natural physics simulation, and precise semantic understanding. Supports multi-prompt scene transitions, element references, and optional audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

Text To Video
Kling Omni Video O3 is Kuaishou's advanced unified multi-modal video model with MVL (Multi-modal Visual Language) technology. Text-to-Video mode generates cinematic videos from text prompts with subject consistency, natural physics simulation, and precise semantic understanding. Supports audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

Reference To Video
Kling Omni Video O3 (Standard) Reference-to-Video generates creative videos using character, prop, or scene references from multiple viewpoints. Extracts subject features and creates new video content while maintaining identity consistency across frames. Supports audio generation. Ready-to-use REST API, best performance, no cold starts, affordable pricing.

Reference To Video
Kling Video O3 4K Reference-to-Video generates creative 4K videos using character, prop, or scene references from multiple viewpoints. Extracts subject features and creates new video content while maintaining identity consistency across frames. Supports multi-reference images, video guidance, and optional audio generation. Ready-to-use REST API, best performance, no cold starts, affordable pricing.

Reference To Video
Kling Omni Video O3 Reference-to-Video generates creative videos using character, prop, or scene references from multiple viewpoints. Extracts subject features and creates new video content while maintaining identity consistency across frames. Supports audio generation. Ready-to-use REST API, best performance, no cold starts, affordable pricing.

Image To Video
Kling Video O3 4K Image-to-Video transforms static images into dynamic cinematic 4K videos. Maintains subject consistency while adding natural motion, physics simulation, and seamless scene dynamics. Supports start/end frame control, multi-prompt, and optional audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

Image To Video
Kling Omni Video O3 Image-to-Video transforms static images into dynamic cinematic videos using MVL (Multi-modal Visual Language) technology. Maintains subject consistency while adding natural motion, physics simulation, and seamless scene dynamics. Supports audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

Image To Video
Kling Omni Video O3 (Standard) Image-to-Video transforms static images into dynamic cinematic videos using MVL (Multi-modal Visual Language) technology. Maintains subject consistency while adding natural motion, physics simulation, and seamless scene dynamics. Supports audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.
See Kling Omni O3 in action
Real outputs generated by the Kling Omni O3 API. Hover any video to preview, click to open the full-size viewer.
How to use the Kling Omni O3 API
Four steps from signup to a finished generation. Full Python, Node.js, and cURL examples are in the API section below.
- 1
Get an API key
Sign up for a WaveSpeedAI account and copy your API key from the dashboard. New accounts come with free starter credits — enough to run the playground a few dozen times before billing kicks in.
- 2
Submit a prediction
POST your input as JSON to https://api.wavespeed.ai/api/v3/kwaivgi/kling-video-o3-std/text-to-video. The endpoint returns a prediction id immediately — generations are async so you don't hold an open connection during inference.
- 3
Poll for completion
GET https://api.wavespeed.ai/api/v3/predictions/{request_id}/result every 1-2 seconds. The response includes a status field; keep polling until it flips from"queued" or"processing" to"completed".
- 4
Read the output URL
Once status is"completed", read the URL from data.outputs[0]. The URL points to your generated media on the WaveSpeedAI CDN — image, video, audio, or 3D file depending on the Kling Omni O3 variant you called.
What you can build with Kling Omni O3
Common workflows developers and creators use the Kling Omni O3 API for.
Unified text-to-video with MVL
kwaivgi/kling-video-o3-std/text-to-video generates cinematic videos from text prompts using Kling Omni's MVL (Multi-modal Visual Language) technology — Kuaishou's advanced unified multi-modal architecture.
Image-to-video with subject consistency
kwaivgi/kling-video-o3-std/image-to-video transforms static images into dynamic cinematic videos while maintaining subject consistency and adding natural motion — the i2v variant when starting from a key still.
Reference-to-video from multiple viewpoints
kwaivgi/kling-video-o3-std/reference-to-video generates creative videos using character, prop, or scene references from multiple viewpoints — extracts subject features and creates new content while maintaining identity.
Conversational video-edit
kwaivgi/kling-video-o3-std/video-edit enables natural-language video edits: remove or replace objects, swap backgrounds, restyle scenes, change weather/lighting, and apply localized 3-10s transforms — edit via prompt rather than manual masking.
Pro tier for top-tier output
kwaivgi/kling-video-o3-pro/* mirrors the Standard variant surface (text-to-video, image-to-video, reference-to-video, video-edit) at Pro quality — same prompt format, switch the endpoint tier for delivery work.
4K tier for delivery resolution
kwaivgi/kling-video-o3-4k/* covers text-to-video, image-to-video, and reference-to-video at 4K delivery resolution — the top tier when output resolution is the limiting factor.
Multi-modal visual language conditioning
MVL technology is the catalog differentiator across O3 variants — subject consistency and natural motion when conditioning on text, images, or reference viewpoints. Useful for cross-modal pipelines that mix still references with motion generation.
Tips for prompting Kling Omni O3
Practical advice for getting better outputs from Kling Omni O3 — drawn from the patterns that work across video models in production pipelines.
Pick Standard vs Pro by delivery needs
kwaivgi/kling-video-o3-std/* for iteration and default delivery; kwaivgi/kling-video-o3-pro/* when Pro quality matters. Same variant names across tiers — only the tier prefix changes.
Use reference-to-video for identity-critical shots
When characters, props, or scenes must stay recognizable, reference-to-video extracts subject features from multiple viewpoints — stronger than text-only conditioning for serialized content.
Video-edit via natural language
Describe edits as conversational commands — remove object, swap background, change weather — rather than supplying masks. O3 video-edit handles localized 3-10s transforms from prompt alone.
MVL benefits from multi-modal inputs
Combine text prompts with reference images when the brief includes specific subjects. MVL technology conditions on both — better subject consistency than text-only generation.
Match variant to input type
text-to-video for greenfield; image-to-video for animating a still; reference-to-video for identity; video-edit for modifying existing footage. Pick the endpoint that matches your source material.
Kling Omni O3 API pricing
Pricing is per-output. The final charge scales with the parameters you set in each variant's playground (resolution, duration, output count, references).
| Endpoint | Type | Starting price |
|---|---|---|
| kwaivgi/kling-video-o3-std/video-edit | video-to-video | $0.63 |
| kwaivgi/kling-video-o3-pro/video-edit | video-to-video | $0.84 |
| kwaivgi/kling-video-o3-std/text-to-video | text-to-video | $0.42 |
| kwaivgi/kling-video-o3-4k/text-to-video | text-to-video | $2.10 |
| kwaivgi/kling-video-o3-pro/text-to-video | text-to-video | $0.56 |
| kwaivgi/kling-video-o3-std/reference-to-video | image-to-video | $0.42 |
| kwaivgi/kling-video-o3-4k/reference-to-video | image-to-video | $2.10 |
| kwaivgi/kling-video-o3-pro/reference-to-video | image-to-video | $0.56 |
| kwaivgi/kling-video-o3-4k/image-to-video | image-to-video | $2.10 |
| kwaivgi/kling-video-o3-pro/image-to-video | image-to-video | $0.56 |
| kwaivgi/kling-video-o3-std/image-to-video | image-to-video | $0.42 |
Call the Kling Omni O3 API
Sign up for an API key at wavespeed.ai/accesskey, then submit a prediction via REST. The playground generates ready-to-paste samples for any combination of inputs.
HTTP example
# 1. Submit a prediction
curl -X POST "https://api.wavespeed.ai/api/v3/kwaivgi/kling-video-o3-std/text-to-video" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{}'
# 2. Poll the result until status = "completed"
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# Read the output URL from data.outputs[0].Node.js example
// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY
const result = await client.run("kwaivgi/kling-video-o3-std/text-to-video", {});
console.log(result.outputs[0]); // → URL of the generated outputPython example
# pip install wavespeed
import wavespeed
output = wavespeed.run(
"kwaivgi/kling-video-o3-std/text-to-video",
{}
)
print(output["outputs"][0]) # → URL of the generated outputKling Omni O3 vs alternatives
When to pick Kling Omni O3 over similar models on WaveSpeedAI.
Kling Omni O3 vs Kling 3.0
Kling 3.0 ships Standard, Pro, and 4K tiers with native audio and a dedicated motion-control endpoint. Kling O3 is the newer Omni architecture with MVL technology and conversational video-edit — broader edit surface, no separate 4K or motion-control tier in the O3 family.
Kling Omni O3 vs Kling Omni O1
Kling O1 is the first Omni unified model with the same variant pattern (text-to-video, i2v, reference-to-video, video-edit). O3 is the advanced successor with improved MVL technology — pick O3 for new projects unless O1 availability or pricing fits better.
Kling Omni O3 vs Wan 2.7
Wan 2.7 ships reference-to-video, video-edit, video-extend, image-edit, and text-to-image in one family at lower cost. Kling O3 stays focused on the Omni video surface with MVL conditioning and conversational edit commands.
Kling Omni O3 API — Frequently asked questions
Pricing, license, integration — common questions about running Kling Omni O3 on WaveSpeedAI.
What is the Kling Omni O3 API?
Kling Omni O3 is a Kuaishou video generation model exposed as a REST API on WaveSpeedAI. Kuaishou Kling Omni Video O3 — advanced unified multi-modal video model with MVL (Multi-modal Visual Language) technology. Standard, Pro, and 4K tiers for text-to-video, image-to-video, reference-to-video, and conversational video-edit. You can call it programmatically or try it from the playground linked above.
How do I call the Kling Omni O3 API?
Sign up for a WaveSpeedAI account, copy your API key from /accesskey, then POST to https://api.wavespeed.ai/api/v3/kwaivgi/kling-video-o3-std/text-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to "completed", then read the output URL from data.outputs[0]. Full Python / Node.js / cURL examples are above.
How much does the Kling Omni O3 API cost?
Kling Omni O3 starts at $0.42 per run. The exact cost scales with the parameters you set (resolution, duration, output count, references). The live cost preview next to the Generate button in the playground shows the exact price for your current input.
Which Kling Omni O3 variants are available?
WaveSpeedAI hosts 11 Kling Omni O3 endpoints: kwaivgi/kling-video-o3-std/video-edit, kwaivgi/kling-video-o3-pro/video-edit, kwaivgi/kling-video-o3-std/text-to-video, kwaivgi/kling-video-o3-4k/text-to-video, kwaivgi/kling-video-o3-pro/text-to-video, kwaivgi/kling-video-o3-std/reference-to-video, kwaivgi/kling-video-o3-4k/reference-to-video, kwaivgi/kling-video-o3-pro/reference-to-video, and more. Each variant has its own playground page and pricing.
Can I use Kling Omni O3 outputs commercially?
Commercial usage rights follow the Kuaishou model license. Most Kuaishou models permit commercial output use; see each model's playground page for the specific license summary, and WaveSpeedAI's Terms of Service for platform-level conditions.
Why use Kling Omni O3 on WaveSpeedAI instead of going direct?
One API key + one billing account across Kling Omni O3 AND 1,000+ other AI models from other providers. No per-vendor SDK setup, no separate rate-limit envelopes, no rewrite-per-vendor integration code. Pricing is typically at parity with or below Kuaishou's direct API.
About Kuaishou
The team behind Kling Omni O3 and the broader Kuaishou model lineup on WaveSpeedAI.
Kuaishou is a major Chinese short-video platform and the team behind the Kling family of video generation models. Kling 3.0 ships Standard, Pro, and 4K tiers with native audio synthesis (a sound parameter on every variant), plus a dedicated motion-control endpoint that transfers motion from a reference video to animate a still character image.
Start building with Kling Omni O3 on WaveSpeedAI
Free starter credits on signup. One API key across 1,000+ AI models from Kuaishou and every other provider.