Kling Advanced Elements
Kling Advanced Elements creates custom AI elements from reference images or videos for consistent character and object appearance across Kling video generations. Define an element with a name, description, and reference material — the model returns a reusable element ID that can be referenced in any Kling generation to maintain identity across clips. Supports both image-based and video-based element creation, with optional voice binding for speaking characters.
Why Choose This?
-
Two reference modes
Choose image_refer (frontal image + up to 4 additional reference images) or video_refer (reference video) to best match your source material.
-
Multi-image support
Capture different angles, expressions, and styles with a frontal image plus up to 4 additional reference images for accurate character consistency.
-
Video character elements
Define a character's full appearance and motion style from a reference video for more dynamic identity capture.
-
Voice binding
Optionally attach a voice ID to the element for talking avatar and dialogue-driven video workflows.
-
Reusable across generations
Created elements can be referenced by ID in any Kling video generation — use the same character across unlimited clips.
Parameters
| Parameter | Required | Description |
|---|
| name | Yes | Element name. Max 20 characters. |
| description | Yes | Element description. Max 100 characters. |
| reference_type | Yes | Reference mode: image_refer (default) or video_refer. |
| frontal_image | Yes (if image_refer) | Front-facing reference image. Required when reference_type is image_refer. |
| refer_images | No | Additional reference images (2–4) from different angles or expressions. |
| element_video_list | Yes (if video_refer) | Reference video defining the character's appearance. Required when reference_type is video_refer. |
| voice_id | No | Voice ID to bind to the element for speaking characters. |
| tag_list | No | Custom tags for organizing and categorizing elements. |
How to Use
- Enter a name — give your element a clear, identifiable name (max 20 characters).
- Write a description — describe the character's appearance, style, and key traits (max 100 characters).
- Select reference_type — choose image_refer for image-based creation or video_refer for video-based.
- If image_refer — upload a frontal_image (required) and optionally add 2–4 refer_images from different angles.
- If video_refer — upload one reference video in element_video_list.
- Add voice_id (optional) — attach a voice ID for speaking character workflows.
- Add tag_list (optional) — add custom tags to organize your element library.
- Submit — save the returned element ID for use in Kling video generations.
Pricing
| Reference Type | Cost per Element |
|---|
| image_refer | $0.010 |
| video_refer | $0.015 |
Best Use Cases
- Consistent character series — Create a reusable character ID to maintain identity across multiple Kling video generations.
- Fashion & wardrobe elements — Define clothing and styling elements for consistent use in fashion video content.
- Brand assets — Build reusable brand mascots, logos, and product elements for marketing video workflows.
- Talking avatar workflows — Combine element IDs with voice IDs for dialogue-driven character video generation.
- E-commerce product elements — Define product elements for consistent product video content at scale.
Pro Tips
- Use clear, well-lit frontal and profile images for the most accurate character identity capture.
- For video_refer mode, use a short clip that clearly shows the character from multiple angles.
- Give elements descriptive names and tags to keep your library organized as it grows.
- Once an element is created, write its name naturally in your generation prompt and enter the element ID in the element_list field — no special characters required.
Notes
- name, description, and reference_type are always required.
- image_refer mode requires at least a frontal_image; refer_images are optional (2–4 additional images).
- video_refer mode requires exactly 1 reference video and costs 1.5× the image_refer price.
- Voice binding is optional and available for both reference types.
- Voice IDs can be obtained through the voice-related API — see the Voice Guide for details.
Related Models
- Kling Elements — Standard element creation for Kling video models.
- Kling Video O3 Pro Text-to-Video — Use your elements in O3 Pro text-to-video generation.
- Kling Video O3 Pro Image-to-Video — Use your elements in O3 Pro image-to-video generation.