Kwaivgi Kling V2 Ai Avatar Pro
Playground
Try it on WavespeedAI!Kling V2 AI Avatar Pro generates high-quality AI avatar videos with clean detail, stable motion, and strong identity consistency—ideal for profiles, intros, and social content. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Features
Kling-v2-ai-avatar-pro — Talking Avatar from Image + Audio
kling-v2-ai-avatar-pro turns a single portrait into a lip-synced talking-head video driven by your own audio. Upload a clear face image, provide a narration or dialogue track, and the model generates a vertical HD avatar clip that speaks and moves naturally on camera.
🌟 Highlights
- Audio-driven performance – Uses your uploaded audio as-is (no TTS), keeping timing, pauses and emotion.
- Photo-real talking avatar – Animates the face, eyes and head while preserving the identity from the reference image.
- One-shot setup – Just an image + audio; no need for video capture or motion recording.
- Portrait-ready output – Produces social-ready vertical video that fits Reels, TikTok, Shorts and story formats.
- Prompt-guided styling (optional) – Use prompt to hint at camera feel or mood (e.g. “soft studio lighting, subtle head movement, gentle smile”).
🔧 Parameters
- audio* – Required. The voice track that drives lip-sync and timing (URL or upload).
- image* – Required. A clear, front-facing portrait of the person to animate.
- prompt – Optional text describing style, expression or camera feel. If omitted, the model uses a neutral talking-head style.
Tip: Use a well-lit, unobstructed face (no heavy motion blur, minimal occlusion) for best identity preservation.
🚀 How to Use
-
Upload audio
- Clean mono/stereo track, with minimal background noise.
- Make sure the final edited length matches what you want in the video.
-
Upload image
- Front or 3/4 view, eyes visible, face not cropped.
- The avatar’s identity and pose come from this image.
-
(Optional) Add a prompt
-
Guide expression or style, e.g.:
- “confident presenter in a tech promo, subtle head nods”
- “friendly customer service tone, warm expression”
-
-
Run the model
- The video length is automatically derived from the audio duration.
- Download the generated talking-head clip and drop it into your editor or directly onto social platforms.
💰 Pricing
Billing is based on audio duration, with a minimum of 5 seconds.
| Audio length (s) | Billed seconds | Price (USD) |
|---|---|---|
| 0–5 | 5 | 0.56 |
| 10 | 10 | 1.12 |
| 20 | 20 | 2.24 |
| 30 | 30 | 3.36 |
| 60 | 60 | 6.72 |
Any clip shorter than 5 seconds is still billed as 5 seconds.
🧠 Tips for Best Results
- Edit your audio first – Remove mistakes, long silences and background noise before upload.
- Match tone to use case – Calm, even delivery for corporate avatars; more expressive reads for ads or UGC.
- Keep framing consistent – Use images with similar head size and framing across a campaign for a unified look.
- Test a few portraits – Small changes in the reference image (lighting, angle) can noticeably change the avatar’s feel.
More Avatar Tools
See our Avatar Tools collection here!
-
infinitetalk – WaveSpeedAI’s Infinitetalk generates lip-synced talking-head avatar videos from your scripts or audio, ideal for virtual presenters and explainer content.
-
Infinitetalk-muti – WaveSpeedAI’s Infinitetalk-Multi extends the avatar pipeline to multi-speaker / multi-segment scenarios, making it easier to script dialogues, panel shots, or batch avatar content.
-
Omni-Human – ByteDance’s Omni-Human 1.5 creates high-fidelity digital humans from images and audio, suitable for realistic virtual hosts, brand ambassadors, and training avatars.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/kwaivgi/kling-v2-ai-avatar-pro" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
| audio | string | Yes | - | - | The audio for generating the output. |
| image | string | Yes | - | The image for generating the output. | |
| prompt | string | No | - | The positive prompt for the generation. |
Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data.id | string | Unique identifier for the prediction, Task Id |
| data.model | string | Model ID used for the prediction |
| data.outputs | array | Array of URLs to the generated content (empty when status is not completed) |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |