WaveSpeedAI APIKwaivgiKwaivgi Kling V2 AI Avatar Pro

Kwaivgi Kling V2 Ai Avatar Pro

Kwaivgi Kling V2 Ai Avatar Pro

Playground

Try it on WavespeedAI!

Kling V2 AI Avatar Pro generates high-quality AI avatar videos with clean detail, stable motion, and strong identity consistency—ideal for profiles, intros, and social content. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

Kling-v2-ai-avatar-pro — Talking Avatar from Image + Audio

kling-v2-ai-avatar-pro turns a single portrait into a lip-synced talking-head video driven by your own audio. Upload a clear face image, provide a narration or dialogue track, and the model generates a vertical HD avatar clip that speaks and moves naturally on camera.


🌟 Highlights

  • Audio-driven performance – Uses your uploaded audio as-is (no TTS), keeping timing, pauses and emotion.
  • Photo-real talking avatar – Animates the face, eyes and head while preserving the identity from the reference image.
  • One-shot setup – Just an image + audio; no need for video capture or motion recording.
  • Portrait-ready output – Produces social-ready vertical video that fits Reels, TikTok, Shorts and story formats.
  • Prompt-guided styling (optional) – Use prompt to hint at camera feel or mood (e.g. “soft studio lighting, subtle head movement, gentle smile”).

🔧 Parameters

  • audio* – Required. The voice track that drives lip-sync and timing (URL or upload).
  • image* – Required. A clear, front-facing portrait of the person to animate.
  • prompt – Optional text describing style, expression or camera feel. If omitted, the model uses a neutral talking-head style.

Tip: Use a well-lit, unobstructed face (no heavy motion blur, minimal occlusion) for best identity preservation.


🚀 How to Use

  1. Upload audio

    • Clean mono/stereo track, with minimal background noise.
    • Make sure the final edited length matches what you want in the video.
  2. Upload image

    • Front or 3/4 view, eyes visible, face not cropped.
    • The avatar’s identity and pose come from this image.
  3. (Optional) Add a prompt

    • Guide expression or style, e.g.:

      • “confident presenter in a tech promo, subtle head nods”
      • “friendly customer service tone, warm expression”
  4. Run the model

    • The video length is automatically derived from the audio duration.
    • Download the generated talking-head clip and drop it into your editor or directly onto social platforms.

💰 Pricing

Billing is based on audio duration, with a minimum of 5 seconds.

Audio length (s)Billed secondsPrice (USD)
0–550.56
10101.12
20202.24
30303.36
60606.72

Any clip shorter than 5 seconds is still billed as 5 seconds.


🧠 Tips for Best Results

  • Edit your audio first – Remove mistakes, long silences and background noise before upload.
  • Match tone to use case – Calm, even delivery for corporate avatars; more expressive reads for ads or UGC.
  • Keep framing consistent – Use images with similar head size and framing across a campaign for a unified look.
  • Test a few portraits – Small changes in the reference image (lighting, angle) can noticeably change the avatar’s feel.

More Avatar Tools

See our Avatar Tools collection here!

  • infinitetalk – WaveSpeedAI’s Infinitetalk generates lip-synced talking-head avatar videos from your scripts or audio, ideal for virtual presenters and explainer content.

  • Infinitetalk-muti – WaveSpeedAI’s Infinitetalk-Multi extends the avatar pipeline to multi-speaker / multi-segment scenarios, making it easier to script dialogues, panel shots, or batch avatar content.

  • Omni-Human – ByteDance’s Omni-Human 1.5 creates high-fidelity digital humans from images and audio, suitable for realistic virtual hosts, brand ambassadors, and training avatars.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/kwaivgi/kling-v2-ai-avatar-pro" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
audiostringYes--The audio for generating the output.
imagestringYes-The image for generating the output.
promptstringNo-The positive prompt for the generation.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

© 2025 WaveSpeedAI. All rights reserved.