Seedance 1.5 Pro is Live Now!Try Now!
Home/Explore/wavespeed-ai/longcat-avatar
image-to-video

image-to-video

LongCat Avatar

wavespeed-ai/longcat-avatar

LongCat Avatar produces super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity. Converts one photo + audio into audio-driven talking or singing avatar videos (Image-to-Video), up to 2 minutes, 720p tier $0.30/5s. Ready-to-use REST API, no coldstarts, affordable pricing.

Hint: You can drag and drop a file or click to upload

Hint: You can drag and drop a file or click to upload

preview

Idle

Your request will cost $0.2 per run.

For $10 you can run this model approximately 50 times.

One more thing::

ExamplesView all

README

LongCat Avatar

What is LongCat Avatar?

LongCat Avatar is an audio-driven video generation model that produces super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity. It converts static photos into lively speaking or singing videos with precise lip sync, aligning the head, face, and body movements to the audio.

Why it looks great

  • Accurate lip synchronization: aligns lip motion precisely with audio, preserving natural rhythm and pronunciation.

  • Full-body coherence: captures head movements, facial expressions, and posture changes beyond the lips.

  • Identity preservation: maintains consistent facial identity and visual style across frames.

  • Image-to-video capability: turns static photos into realistic speaking or singing videos.

  • Natural dynamics: produces unnoticeable color tone consistency and natural dynamics across multiple speaker scenarios.

Pricing

Output ResolutionCost per 5 secondsMax Length
480p$0.152 minutes
720p$0.302 minutes

Billing Rules

  • Standard Rate: $0.03 per second
  • HD (720p) Rate: $0.06 per second (Double the Standard Rate)
  • Minimum Charge: All videos are billed for a minimum of 5 seconds (costing at least $0.15).
  • Billing Cap: To keep your costs predictable, billing is capped at 120 seconds (2 minutes).

How to Use

  1. Upload the audio file.
  2. Upload the image (the person to animate).
  3. (Optional) Add a prompt to guide expression, style, or pose.
  4. Select the resolution (480p or 720p).
  5. Set the seed (set a fixed number for reproducibility).
  6. Submit the job and download the result once it's ready.

Note

  • Max clip length per job: up to 2 minutes
  • Processing speed: approximately 10-30 seconds of wall time per 1 second of video (varies by resolution and queue load)

Reference