
scenario-marketing
Idle
Your request will cost $0.15 per run.
For $10 you can run this model approximately 66 times.
Kling LipSync is an advanced audio-to-video model that drives natural, lifelike lip movements to match any input audio. Give it a clean voice or singing track plus a video, and it reanimates the mouth region so your character looks like they are truly speaking or singing those lines.
Natural, highly matched lip motion Mouth shapes line up closely with phonemes in the audio, while respecting each character’s facial structure. This yields expressive, believable speech and singing instead of robotic mouth flapping.
Accurate facial muscle response Lip animation also drives cheeks, jawline, and surrounding muscles. Subtle stretches and contractions are reflected in real time, greatly improving realism and immersion.
Non-destructive background and body Only the face region is re-rendered. Clothing, hands, environment, and lighting outside the face stay consistent with the original video, preserving continuity and avoiding unwanted artifacts.
Required inputs
audio: The target voice or singing track (locally recorded or generated). Duration of this audio controls billing and should roughly match the video length.
video: The source video whose character(s) will be lip-synced to the audio.
Output
Billing is based on the audio duration with a 5-second minimum and a cap at 600 seconds.
From this:
Summary table:
| Metric | Value |
|---|---|
| Price per second | $0.030 |
| Minimum billed duration | 5 seconds |
| Minimum total price | $0.15 |
| Maximum billed duration | 600 seconds |
| Maximum total price per run | $18.00 |
Example costs:
| Audio length | Billed seconds | Total price |
|---|---|---|
| 4 s | 5 s | $0.15 |
| 10 s | 10 s | $0.30 |
| 60 s | 60 s | $1.80 |
| 180 s | 180 s | $5.40 |
| 600 s | 600 s | $18.00 |
| 900 s | 600 s (capped) | $18.00 |