
image-to-video
Idle
Your request will cost $0.2 per run.
For $10 you can run this model approximately 50 times.
One more thing::
LongCat Avatar is an audio-driven video generation model that produces super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity. It converts static photos into lively speaking or singing videos with precise lip sync, aligning the head, face, and body movements to the audio.
Accurate lip synchronization: aligns lip motion precisely with audio, preserving natural rhythm and pronunciation.
Full-body coherence: captures head movements, facial expressions, and posture changes beyond the lips.
Identity preservation: maintains consistent facial identity and visual style across frames.
Image-to-video capability: turns static photos into realistic speaking or singing videos.
Natural dynamics: produces unnoticeable color tone consistency and natural dynamics across multiple speaker scenarios.
| Output Resolution | Cost per 5 seconds | Max Length |
|---|---|---|
| 480p | $0.15 | 2 minutes |
| 720p | $0.30 | 2 minutes |