Nano Banana 2 & Pro Sale — 15% OFF | Apr 1–15 Only
Início/Explorar/Kling O3 Models/kwaivgi/kling-elements-advanced
image-to-text

image-to-text

Kling Advanced Elements

kwaivgi/kling-elements-advanced

Kling Advanced Elements creates custom AI elements from reference images or videos for consistent character and object appearance across Kling video generations. Supports multi-image elements with frontal and reference images, video character elements, and optional voice binding. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Input
Hint: video_refer: Video Character Elements, at this time, the subject's appearance will be defined with reference to element_video_list. image_refer: Multi-Image Elements, whose appearance will be defined with reference to the element_image_list.

Drag & drop or click to upload

preview

Drag & drop or click to upload

preview

Drag & drop or click to upload

Hint: The ID can be obtained through the voice-related API. For details, see Voice Guide “https://wavespeed.ai/docs/docs-api/kwaivgi/kwaivgi-kling-v2.6-create-voice”

Idle

{ "element_id": 307212335131305, "element_name": "handsome man", "element_type": "image_refer", "element_description": "a cool guy" }

Sua solicitação custará $0.01 por execução.

Por $1 você pode executar este modelo aproximadamente 100 vezes.

Mais uma coisa:

ExemplosVer todos

README

Kling Advanced Elements

Kling Advanced Elements creates custom AI elements from reference images or videos for consistent character and object appearance across Kling video generations. Define an element with a name, description, and reference material — the model returns a reusable element ID that can be referenced in any Kling generation to maintain identity across clips. Supports both image-based and video-based element creation, with optional voice binding for speaking characters.

Why Choose This?

  • Two reference modes Choose image_refer (frontal image + up to 4 additional reference images) or video_refer (reference video) to best match your source material.

  • Multi-image support Capture different angles, expressions, and styles with a frontal image plus up to 4 additional reference images for accurate character consistency.

  • Video character elements Define a character's full appearance and motion style from a reference video for more dynamic identity capture.

  • Voice binding Optionally attach a voice ID to the element for talking avatar and dialogue-driven video workflows.

  • Reusable across generations Created elements can be referenced by ID in any Kling video generation — use the same character across unlimited clips.

Parameters

ParameterRequiredDescription
nameYesElement name. Max 20 characters.
descriptionYesElement description. Max 100 characters.
reference_typeYesReference mode: image_refer (default) or video_refer.
frontal_imageYes (if image_refer)Front-facing reference image. Required when reference_type is image_refer.
refer_imagesNoAdditional reference images (2–4) from different angles or expressions.
element_video_listYes (if video_refer)Reference video defining the character's appearance. Required when reference_type is video_refer.
voice_idNoVoice ID to bind to the element for speaking characters.
tag_listNoCustom tags for organizing and categorizing elements.

How to Use

  1. Enter a name — give your element a clear, identifiable name (max 20 characters).
  2. Write a description — describe the character's appearance, style, and key traits (max 100 characters).
  3. Select reference_type — choose image_refer for image-based creation or video_refer for video-based.
  4. If image_refer — upload a frontal_image (required) and optionally add 2–4 refer_images from different angles.
  5. If video_refer — upload one reference video in element_video_list.
  6. Add voice_id (optional) — attach a voice ID for speaking character workflows.
  7. Add tag_list (optional) — add custom tags to organize your element library.
  8. Submit — save the returned element ID for use in Kling video generations.

Pricing

Reference TypeCost per Element
image_refer$0.010
video_refer$0.015

Best Use Cases

  • Consistent character series — Create a reusable character ID to maintain identity across multiple Kling video generations.
  • Fashion & wardrobe elements — Define clothing and styling elements for consistent use in fashion video content.
  • Brand assets — Build reusable brand mascots, logos, and product elements for marketing video workflows.
  • Talking avatar workflows — Combine element IDs with voice IDs for dialogue-driven character video generation.
  • E-commerce product elements — Define product elements for consistent product video content at scale.

Pro Tips

  • Use clear, well-lit frontal and profile images for the most accurate character identity capture.
  • For video_refer mode, use a short clip that clearly shows the character from multiple angles.
  • Give elements descriptive names and tags to keep your library organized as it grows.
  • Once an element is created, write its name naturally in your generation prompt and enter the element ID in the element_list field — no special characters required.

Notes

  • name, description, and reference_type are always required.
  • image_refer mode requires at least a frontal_image; refer_images are optional (2–4 additional images).
  • video_refer mode requires exactly 1 reference video and costs 1.5× the image_refer price.
  • Voice binding is optional and available for both reference types.
  • Voice IDs can be obtained through the voice-related API — see the Voice Guide for details.

Related Models