InfiniteTalk Fast Video-to-Video Multi
What is InfiniteTalk Fast Video-to-Video Multi?
InfiniteTalk Fast Video-to-Video Multi creates videos with accurate lip sync for multiple characters by combining an input video and two audio tracks (left and right). It uses fast inference for quicker results while maintaining quality lip synchronization, matching head, face, and body movements to each audio source.
Why it looks great
- Accurate lip synchronization: aligns lip motion precisely with audio for both characters, preserving natural rhythm and pronunciation.
- Full-body coherence: captures head movements, facial expressions, and posture changes beyond the lips.
- Identity preservation: maintains consistent facial identity and visual style across frames.
- Video-to-video capability: uses an existing video as the base, preserving the original scene and motion.
- Mask control: optional mask images let you define which regions can move.
- Instruction following: accepts text prompts to control scene, pose, or behavior while syncing to audio.
How to Use
- Upload the left and right audio files.
- Upload your video (The video should clearly show two people).
- (Optional) Upload a mask image to control which regions can move.
- Select the speaking order (left to right, right to left, or meanwhile).
- Write the prompt if needed.
- Submit the job and download the results once they're ready.
Note
- Max clip length per job: up to 10 minutes
- Processing speed: approximately 10–30 seconds of wall time per 1 second of video (varies by queue load)
- Mask safety tip: Do not upload the full image as mask_image. The mask should only cover the regions you want to animate—otherwise the result may render as fully black.
More Versions
Reference