Kwaivgi Kling Video O3 Std Reference To Video
Playground
Try it on WavespeedAI!Kling Omni Video O3 (Standard) Reference-to-Video generates creative videos using character, prop, or scene references from multiple viewpoints. Extracts subject features and creates new video content while maintaining identity consistency across frames. Supports audio generation. Ready-to-use REST API, best performance, no cold starts, affordable pricing.
Features
Kling Video O3 Std Reference-to-Video
Kling Video O3 Standard Reference-to-Video generates new videos guided by reference images and an optional reference video, maintaining consistent characters, styles, and scenes. Describe a scenario involving the people or elements in your reference images — the model brings them together in a coherent, natural video. Supports flexible duration, aspect ratio control, and optional sound generation.
Why Choose This?
-
Character-consistent generation Upload reference images of specific people or elements, and the model preserves their identity throughout the generated video.
-
Multi-reference support Provide multiple reference images to combine different characters, styles, or elements in one scene.
-
Optional reference video Supply a reference video for motion guidance, style transfer, or scene continuity.
-
Sound options Keep original audio from a reference video, or generate new synchronized sound effects.
-
Flexible output Multiple aspect ratios (16:9, 9:16, 1:1, etc.) and duration from 3 to 15 seconds.
Parameters
| Parameter | Required | Description |
|---|---|---|
| prompt | Yes | Text description of the desired scene and action |
| video | No | Reference video for motion or style guidance |
| images | No | Reference images of characters, elements, or styles |
| keep_original_sound | No | Keep the original sound from the reference video (default: enabled) |
| sound | No | Generate synchronized audio for the video (default: disabled) |
| aspect_ratio | No | Video aspect ratio (default: 16:9) |
| duration | No | Video length in seconds (min: 3, max: 15, default: 5) |
How to Use
- Write your prompt — describe the scene, referencing the characters or elements in your images (e.g., “The man in Figure 2 is walking with the woman in Figure 1 in the park.”).
- Add reference images — upload images of the characters, objects, or styles you want in the video.
- Add reference video (optional) — provide a video for motion or style guidance.
- Choose aspect ratio — select the format that fits your platform.
- Set duration — choose any length from 3 to 15 seconds (default: 5).
- Set sound preference — keep original audio from the reference video, or enable generated sound.
- Run — submit and download your video.
Pricing
| Duration | Sound Off | Sound On |
|---|---|---|
| 3 s | $0.252 | $0.336 |
| 5 s | $0.42 | $0.56 |
| 10 s | $0.84 | $1.12 |
| 15 s | $1.26 | $1.68 |
Billing Rules
- Base rate: $0.42 per 5 seconds
- Sound multiplier: disabled = 1×, enabled = 4/3×
- Duration range: 3–15 seconds
Best Use Cases
- Character-Driven Storytelling — Create scenes starring specific characters from your reference images.
- Social Media Content — Produce personalized short-form videos with consistent character identity.
- Marketing & Ads — Generate brand ambassador or spokesperson videos from still photos.
- Creative Concepting — Combine multiple characters or elements into new scenarios for rapid ideation.
- Style Transfer — Use a reference video to guide the motion and visual style of new content.
Pro Tips
- Reference images with clear faces and distinct features produce the best character consistency.
- Use “Figure 1”, “Figure 2” etc. in your prompt to refer to specific reference images in order.
- Adding a reference video significantly enhances motion quality but increases cost to 3×.
- Use shorter durations (3–5 s) for testing character consistency before generating longer clips.
- Match aspect ratio to your target platform: 16:9 for YouTube, 9:16 for TikTok/Reels.
Notes
- Prompt is the only required field, but reference images are recommended for best results.
- Duration range: minimum 3 seconds, maximum 15 seconds.
- When a reference video is provided, the cost is 3× the base rate regardless of sound settings.
- Ensure uploaded URLs are publicly accessible.
Related Models
- Kling Video O3 Pro Reference-to-Video — Maximum quality reference-to-video with O3 Pro tier.
- Kling Video O3 Std Image-to-Video — Animate a single image into video at Standard pricing.
- Kling Video O3 Std Text-to-Video — Generate videos from text prompts at Standard pricing.
- Kling Video O3 Std Video Edit — Edit existing videos with natural-language instructions.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/kwaivgi/kling-video-o3-std/reference-to-video" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"keep_original_sound": true,
"sound": false,
"aspect_ratio": "16:9",
"duration": 5
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
| prompt | string | Yes | - | The positive prompt for the generation. | |
| video | string | No | - | The reference video URL. | |
| images | array | No | [] | - | Reference images. With a reference video: image elements ≤ 4; without a reference video: ≤ 7 |
| keep_original_sound | boolean | No | true | - | Whether to keep the original sound from the reference video. |
| sound | boolean | No | false | - | Whether to generate audio for the video. |
| aspect_ratio | string | No | 16:9 | 16:9, 9:16, 1:1 | The aspect ratio of the generated video. |
| duration | integer | No | 5 | 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 | The duration of the generated media in seconds (3-15). |
| multi_prompt | array | No | - | - | List of multi-prompt elements for the generation. |
Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data.id | string | Unique identifier for the prediction, Task Id |
| data.model | string | Model ID used for the prediction |
| data.outputs | array | Array of URLs to the generated content (empty when status is not completed) |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |
Result Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| id | string | Yes | - | Task ID |
Result Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data | object | The prediction data object containing all details |
| data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
| data.model | string | Model ID used for the prediction |
| data.outputs | string | Array of URLs to the generated content (empty when status is not completed). |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |