Nano Banana Pro | Nano Banana 2Mar.13 - 26 (UTC+8) 25% off
WaveSpeed.ai
Accueil/Explorer/wavespeed-ai/infinitetalk-fast/video-to-video-multi
digital-human

digital-human

InfiniteTalk Fast Video-To-Video Multi

wavespeed-ai/infinitetalk-fast/video-to-video-multi

InfiniteTalk fast video-to-video multi converts a video and two audio inputs into multi-character talking or singing videos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Input

Hint: You can drag and drop a file or click to upload

Hint: You can drag and drop a file or click to upload

Hint: You can drag and drop a file or click to upload

Hint: You can drag and drop a file or click to upload

Idle

Votre requête coûtera $0.075 par exécution.

Pour $1 vous pouvez exécuter ce modèle environ 13 fois.

README

InfiniteTalk Fast Video-to-Video Multi

What is InfiniteTalk Fast Video-to-Video Multi?

InfiniteTalk Fast Video-to-Video Multi creates videos with accurate lip sync for multiple characters by combining an input video and two audio tracks (left and right). It uses fast inference for quicker results while maintaining quality lip synchronization, matching head, face, and body movements to each audio source.

Why it looks great

  • Accurate lip synchronization: aligns lip motion precisely with audio for both characters, preserving natural rhythm and pronunciation.
  • Full-body coherence: captures head movements, facial expressions, and posture changes beyond the lips.
  • Identity preservation: maintains consistent facial identity and visual style across frames.
  • Video-to-video capability: uses an existing video as the base, preserving the original scene and motion.
  • Mask control: optional mask images let you define which regions can move.
  • Instruction following: accepts text prompts to control scene, pose, or behavior while syncing to audio.

How to Use

  1. Upload the left and right audio files.
  2. Upload your video (The video should clearly show two people).
  3. (Optional) Upload a mask image to control which regions can move.
  4. Select the speaking order (left to right, right to left, or meanwhile).
  5. Write the prompt if needed.
  6. Submit the job and download the results once they're ready.

Note

  • Max clip length per job: up to 10 minutes
  • Processing speed: approximately 10–30 seconds of wall time per 1 second of video (varies by queue load)
  • Mask safety tip: Do not upload the full image as mask_image. The mask should only cover the regions you want to animate—otherwise the result may render as fully black.

More Versions

Reference