Home/Explore/Avatar Lipsync/wavespeed-ai/wan-2.1/mocha

video-to-video

wavespeed-ai/wan-2.1/mocha

MoCha is an end-to-end video character replacement system designed to swap a character in a source video with a new character (provided via reference images), without relying on explicit structural guidance (e.g., pose/depth maps) for every frame.

Hint: You can drag and drop a file or click to upload

preview

Hint: You can drag and drop a file or click to upload

Idle

Your request will cost $0.2 per run.

For $10 you can run this model approximately 50 times.

One more thing:

ExamplesView all

README

MoCha 🎭 — AI Video Character Replacement

MoCha is an end-to-end video character replacement system that seamlessly swaps the main character in a video with a new one provided via reference images. Unlike traditional methods, it requires no explicit per-frame structural guidance (such as pose or depth maps), while maintaining realistic motion, lighting, and facial expressions throughout the clip.

🌟 Key Features

  • 🧠 Structure-Free Replacement No need for pose or depth maps — MoCha automatically aligns motion, expression, and body posture.

  • 🎥 Motion Preservation Accurately transfers the source actor’s motion, emotion, and camera perspective to the target character.

  • 🎨 Identity Consistency Maintains the new character’s facial identity, lighting, and style across frames without flickering.

  • ⚙️ Easy Setup Works with a single image and a source video — no need for complex preprocessing or rigging.

  • 💡 High Realism, Low Effort Perfect for film, advertising, digital avatars, and creative character transformation.

💰 Pricing

ResolutionPrice per 5sPrice per secondMax Length
480p$0.20$0.04 / s120 s
720p$0.40$0.08 / s120 s

Billing Rules

  • Minimum charge: 5 seconds - any video shorter than 5 seconds is billed as 5 seconds.
  • Maximum billed duration: 120 seconds (2 minutes)

⚙️ How to Use

  1. Upload image — A clear reference image of the new character (recommended formats: JPG / PNG, avoid WEBP).
  2. Upload video — The motion source; MoCha extracts pose and expression dynamics from this clip.
  3. Add prompt (optional) — Guide the output, e.g. “preserve outfit; natural expressions; no background changes.”
  4. Select resolution — Choose between 480p or 720p.
  5. Generate — Wait a moment while MoCha processes the replacement.
  6. Review & Iterate — Fix a seed to reproduce results, or vary it for A/B comparisons.

🧩 Tips for Best Results

  • Match Pose & Composition: Keep your reference image’s camera angle, body orientation, and framing close to the target video.
  • Keep Aspect Ratios Consistent: Use the same aspect ratio between your input image and video.
  • Limit Video Length: For best stability, keep clips under 60 seconds — longer clips may show slight quality degradation.
  • Lighting Consistency: Match lighting direction and tone between image and video to minimize blending artifacts.