Nano Banana 2Nano Banana 2 is live
WaveSpeed.ai
Início/Explorar/Avatar Lipsync Models/sync/lipsync-1.9.0-beta
digital-human

digital-human

Sync Lipsync 1.9.0 Beta

sync/lipsync-1.9.0-beta

Generate realistic lip-sync animations from audio using advanced algorithms for high-quality facial synchronization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Input

Hint: You can drag and drop a file or click to upload

Hint: You can drag and drop a file or click to upload

Idle

Sua solicitação custará $0.025 por execução.

Por $1 você pode executar este modelo aproximadamente 40 vezes.

ExemplosVer todos

README

sync/lipsync-1.9.0-beta — Audio-to-Video Lip Sync

sync/lipsync-1.9.0-beta takes an existing video and a separate audio track, then reanimates the speaker’s mouth so the lips match the new speech. It’s a zero-shot lipsync model from Sync Labs—no training or cloning step required.

🔍 Highlights

  • Zero-shot lipsync – Works on any person in any video; just upload video + audio.
  • Style-aware editing – Adjusts only the mouth region while keeping the person’s identity, lighting, and background intact.
  • Cross-domain support – Handles live-action footage, stylised CG, and AI-generated faces.
  • Flexible timing controlsync_mode lets you decide how to handle length mismatches between video and audio.

🧩 Parameters

  • video* Required. Input video to be edited (URL or upload). Use a shot with a clearly visible face for best results.

  • audio* Required. Target speech track (URL or upload, e.g. MP3/WAV). The model will align lip movements to this audio.

  • sync_mode Controls behavior when video and audio durations differ. Options:

    • loop
    • bounce
    • cut_off
    • silence
    • remap

    Choose how you want the shorter stream to be treated (looped, trimmed, padded with silence, or time-remapped).

Output: a new video where the speaker’s lips follow the uploaded audio.

💰 Pricing

Rate: $0.025 per second of processed video.

Clip length (s)Price (USD)
5$0.13
10$0.25
20$0.50
30$0.75
60$1.50

You will only be charged for the actual duration of the input video after upload.

🚀 How to Use

  1. Upload your video in the video field (face should be front-facing or ¾ view, with minimal occlusion).
  2. Upload your audio in the audio field (clean speech, minimal background noise).
  3. Pick a sync_mode depending on how you want to handle length mismatches.
  4. Click Run and wait for the processed clip.
  5. Review the result; if timing feels off, try a different sync_mode or tweak your source video/audio.

💡 Tips

  • Use clean, well-lit close-ups for the most convincing lipsync.
  • Avoid heavy head turns or faces partially out of frame.
  • For dubbed content, make sure the speech rhythm in your audio is reasonably close to the original—lipsync works best when phrasing and pauses roughly match the performance.

More Models to Try

  • WaveSpeedAI / InfiniteTalk WaveSpeedAI’s single-avatar talking-head model that turns one photo plus audio into smooth, lip-synced digital presenter videos for tutorials, marketing, and social content.

  • WaveSpeedAI / InfiniteTalk Multi Multi-avatar version of InfiniteTalk that drives several characters in one scene from separate audio tracks, ideal for dialog-style explainers, interviews, and role-play videos.

  • Kwaivgi / Kling V2 AI Avatar Standard Cost-effective Kling-based AI avatar model that generates natural talking-face videos from a single reference image and voice track, suitable for everyday content and customer support.

  • Kwaivgi / Kling V2 AI Avatar Pro Higher-fidelity Kling V2 avatar model for premium digital humans, offering smoother motion, better lip-sync, and more stable faces for commercials, brand spokespeople, and product demos.