InfiniteTalk
What is InfiniteTalk?
InfiniteTalk produces videos with precise lip sync, aligning the head, face, and body movements to the audio. It maintains identity across unlimited-length videos and also offers image-to-video generation, turning static photos into lively speaking or singing videos.
Why it looks great
-
Accurate lip synchronization: aligns lip motion precisely with audio, preserving natural rhythm and pronunciation.
-
Full-body coherence: captures head movements, facial expressions, and posture changes beyond the lips.
-
Identity preservation: maintains consistent facial identity and visual style across frames.
-
Image-to-video capability: turns static photos into realistic speaking or singing videos.
-
Instruction following: accepts text prompts to control scene, pose, or behavior while syncing to audio.
Pricing
| Output Resolution | Cost per 5 seconds | Max Length |
|---|
| 480p | $0.15 | 10 minutes |
| 720p | $0.30 | 10 minutes |
Billing Rules
- Standard Rate: $0.03 per second
- HD (720p) Rate: $0.06 per second (Double the Standard Rate)
- Minimum Charge: All videos are billed for a minimum of 5 seconds (costing at least $0.15).
- Billing Cap: To keep your costs predictable, billing is capped at 600 seconds (10 minutes).
How to Use
- Upload the audio file.
- Upload the image (the person to animate).
- (Optional) Paint a mask_image to specify which regions can move.
- (Optional) Add a prompt to guide expression, style, or pose.
- Select the resolution (480p or 720p).
- Set the seed (set a fixed number for reproducibility).
- Submit the job and download the result once it's ready.
Note
- Max clip length per job: up to 10 minutes
- Processing speed: approximately 10–30 seconds of wall time per 1 second of video (varies by resolution and queue load)
- Mask safety tip: Do not upload the full image as mask_image. The mask should only cover the regions you want to animate—otherwise the result may render as fully black.
More Versions
Reference