
digital-human
Idle
이 요청에는 $0.2 실행당가 필요합니다.
$10으로 이 모델을 약 50회 실행할 수 있습니다.
LongCat Avatar is an audio-driven video generation model that produces super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity. It converts static photos into lively speaking or singing videos with precise lip sync, aligning the head, face, and body movements to the audio.
Accurate lip synchronization: aligns lip motion precisely with audio, preserving natural rhythm and pronunciation.
Full-body coherence: captures head movements, facial expressions, and posture changes beyond the lips.
Identity preservation: maintains consistent facial identity and visual style across frames.
Image-to-video capability: turns static photos into realistic speaking or singing videos.
Natural dynamics: produces unnoticeable color tone consistency and natural dynamics across multiple speaker scenarios.
| Output Resolution | Cost per 5 seconds | Max Length |
|---|---|---|
| 480p | $0.20 | 2 minutes |
| 720p | $0.40 | 2 minutes |