Kwaivgi Kling Video O3 Std Reference To Video

Playground

Kling Omni Video O3 (Standard) Reference-to-Video generates creative videos using character, prop, or scene references from multiple viewpoints. Extracts subject features and creates new video content while maintaining identity consistency across frames. Supports audio generation. Ready-to-use REST API, best performance, no cold starts, affordable pricing.

Features

Kling Omni Video O3 — Reference-to-Video (Standard)

Kling Omni Video O3 (Standard) is Kuaishou’s advanced unified multi-modal video model, optimized for production workloads. The Reference-to-Video mode creates new video content based on subject references — maintaining character, prop, and scene identity while generating entirely new creative scenarios. Supports optional audio generation.

Key Capabilities

Multi-Reference Subject Creation

Build subjects from multiple reference viewpoints:

Extract features from character, prop, or scene images
Maintain consistent identity in generated videos
Create new scenarios with familiar subjects

Subject Consistency Technology

Advanced feature extraction ensures:

Stable character appearance across all frames
Consistent clothing, accessories, and props
Maintained facial features and expressions
Coherent scene elements and backgrounds

Creative Freedom

Generate entirely new content while preserving identity:

New poses and actions
Different scenes and environments
Various camera angles and movements
Fresh creative scenarios

Audio Support

Optionally generate synchronized audio or keep original sound from reference videos.

Core Features

Identity Lock — Subject features remain consistent throughout video
Multi-Angle Support — Use references from various viewpoints
Scene Flexibility — Place subjects in new environments
Motion Control — Guide actions with text prompts
Audio Options — Keep original sound or generate new audio
Cost Optimized — Standard tier for production workloads

How to Use

Upload Reference Images/Video Provide one or more images of your subject (character, object, or scene), or a reference video.
Describe the Scenario Write a prompt for the new video content.

Example: “The character walking through a futuristic city at night, neon lights reflecting on wet streets”
Set Parameters Choose duration (3-15s), aspect ratio, and audio options.
Generate Receive video with your subject in the new scenario.

Pricing

Reference Type	Price per Second
Image Reference	$0.084
Video Reference	$0.126

$0.084/s for image reference only; $0.126/s when using video reference (1.5x multiplier).

Pro Tips

Use multiple reference angles for better identity capture
Provide clear, high-resolution reference images
Describe actions and environments clearly in prompts
Works best for characters, products, and distinct objects

Note

If the input reference parameters include a video, then the number of reference images that can be entered will be reduced to 4.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/kwaivgi/kling-video-o3-std/reference-to-video" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "keep_original_sound": true,
    "sound": false,
    "aspect_ratio": "16:9",
    "duration": 5
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
prompt	string	Yes		-	The positive prompt for the generation.
video	string	No		-	The reference video URL.
images	array	No	[]	-	Reference images. With a reference video: image elements ≤ 4; without a reference video: ≤ 7
keep_original_sound	boolean	No	true	-	Whether to keep the original sound from the reference video.
sound	boolean	No	false	-	Whether to generate audio for the video.
aspect_ratio	string	No	16:9	16:9, 9:16, 1:1	The aspect ratio of the generated video.
duration	integer	No	5	3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15	The duration of the generated media in seconds (3-15).

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction, the ID of the prediction to get
data.model	string	Model ID used for the prediction
data.outputs	string	Array of URLs to the generated content (empty when status is not completed).
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Kwaivgi Kling Video O3 Std Image To Video Kwaivgi Kling Video O3 Std Text To Video