
digital-human
Idle
Votre requête coûtera $0.08 par exécution.
Pour $1 vous pouvez exécuter ce modèle environ 12 fois.
Lipsync-2-pro is a zero-shot model for generating realistic lip movements that match spoken audio. It works out of the box—no training or fine-tuning—and preserves the speaker’s natural style across languages, cameras, and video types. From live-action footage to animated or AI-generated faces, it brings broadcast-grade dubbing and dialogue editing into a simple API call.
video* Input video to be re-synced (URL or upload). Works best with relatively stable talking-head or upper-body shots.
audio* Target speech (URL or upload). The new lip motion will follow this track.
sync_mode Controls how audio and video lengths are aligned:
Output: a new MP4 video with updated lipsync.
Billing is purely based on audio length.
| Audio length (seconds) | Price (USD) |
|---|---|
| 5 | $0.40 |
| 10 | $0.80 |
| 15 | $1.20 |
| 30 | $2.40 |
| 60 | $4.80 |
You can estimate other costs by multiplying the audio duration (in seconds) by $0.08/s; charges scale linearly with length.
WaveSpeedAI / InfiniteTalk WaveSpeedAI’s single-avatar talking-head model that turns one photo plus audio into smooth, lip-synced digital presenter videos for tutorials, marketing, and social content.
WaveSpeedAI / InfiniteTalk Multi Multi-avatar version of InfiniteTalk that drives several characters in one scene from separate audio tracks, ideal for dialog-style explainers, interviews, and role-play videos.
Kwaivgi / Kling V2 AI Avatar Standard Cost-effective Kling-based AI avatar model that generates natural talking-face videos from a single reference image and voice track, suitable for everyday content and customer support.
Kwaivgi / Kling V2 AI Avatar Pro Higher-fidelity Kling V2 avatar model for premium digital humans, offering smoother motion, better lip-sync, and more stable faces for commercials, brand spokespeople, and product demos.