InfiniteTalk Video-to-Video Multi
What is InfiniteTalk Video-to-Video Multi?
InfiniteTalk Video-to-Video Multi creates videos with accurate lip sync for multiple characters by combining an input video and two audio tracks (left and right). It maintains identity across unlimited-length videos, ensuring precise lip synchronization while matching head, face, and body movements to each audio source.
Why it looks great
- Accurate lip synchronization: aligns lip motion precisely with audio for both characters, preserving natural rhythm and pronunciation.
- Full-body coherence: captures head movements, facial expressions, and posture changes beyond the lips.
- Identity preservation: maintains consistent facial identity and visual style across frames.
- Video-to-video capability: uses an existing video as the base, preserving the original scene and motion.
- Mask control: optional mask images let you define which regions can move.
- Instruction following: accepts text prompts to control scene, pose, or behavior while syncing to audio.
Pricing
| Output Resolution | Cost per 5 seconds | Max Length |
|---|
| 480p | $0.15 | 10 minutes |
| 720p | $0.30 | 10 minutes |
Billing Rules
- Standard Rate: $0.03 per second
- HD (720p) Rate: $0.06 per second (Double the Standard Rate)
- Minimum Charge: All videos are billed for a minimum of 5 seconds (costing at least $0.15).
- Billing Cap: To keep your costs predictable, billing is capped at 600 seconds (10 minutes).
How to Use
- Upload the left and right audio files.
- Upload your video (The video should clearly show two people).
- (Optional) Upload a mask image to control which regions can move.
- Select the speaking order (left to right, right to left, or meanwhile).
- Select the resolution (480p or 720p).
- Write the prompt if needed.
- Submit the job and download the results once they're ready.
Note
- Max clip length per job: up to 10 minutes
- Processing speed: approximately 10–30 seconds of wall time per 1 second of video (varies by resolution and queue load)
- Mask safety tip: Do not upload the full image as mask_image. The mask should only cover the regions you want to animate—otherwise the result may render as fully black.
More Versions
Reference