
image-to-text
Idle
{ "element_id": 307212335131305, "element_name": "handsome man", "element_type": "image_refer", "element_description": "a cool guy" }
Tu solicitud costará $0.01 por ejecución.
Con $1 puedes ejecutar este modelo aproximadamente 100 veces.
Una cosa más:
Kling Advanced Elements creates custom AI elements from reference images or videos for consistent character and object appearance across Kling video generations. Define an element with a name, description, and reference material — the model returns a reusable element ID that can be referenced in any Kling generation to maintain identity across clips. Supports both image-based and video-based element creation, with optional voice binding for speaking characters.
Two reference modes Choose image_refer (frontal image + up to 4 additional reference images) or video_refer (reference video) to best match your source material.
Multi-image support Capture different angles, expressions, and styles with a frontal image plus up to 4 additional reference images for accurate character consistency.
Video character elements Define a character's full appearance and motion style from a reference video for more dynamic identity capture.
Voice binding Optionally attach a voice ID to the element for talking avatar and dialogue-driven video workflows.
Reusable across generations Created elements can be referenced by ID in any Kling video generation — use the same character across unlimited clips.
| Parameter | Required | Description |
|---|---|---|
| name | Yes | Element name. Max 20 characters. |
| description | Yes | Element description. Max 100 characters. |
| reference_type | Yes | Reference mode: image_refer (default) or video_refer. |
| frontal_image | Yes (if image_refer) | Front-facing reference image. Required when reference_type is image_refer. |
| refer_images | No | Additional reference images (2–4) from different angles or expressions. |
| element_video_list | Yes (if video_refer) | Reference video defining the character's appearance. Required when reference_type is video_refer. |
| voice_id | No | Voice ID to bind to the element for speaking characters. |
| tag_list | No | Custom tags for organizing and categorizing elements. |
| Reference Type | Cost per Element |
|---|---|
| image_refer | $0.010 |
| video_refer | $0.015 |