Home/Explore/Wan 2.1 Video Models/wavespeed-ai/wan-2.1/ditto

video-to-video

wavespeed-ai/wan-2.1/ditto

Wan2.1-DITTO is a unified model for video-to-video generation with holistic movement and expression replication.

Hint: You can drag and drop a file or click to upload

Idle

Your request will cost $0.2 per run.

For $10 you can run this model approximately 50 times.

One more thing:

ExamplesView all

README

Wan2.1-DITTO

Wan2.1-DITTO is an optimized video-to-video generation model that transforms existing footage into new visual styles guided by text or style prompts. With unified diffusion tuning, it delivers cinematic motion, smooth temporal consistency, and vivid artistic expression across multiple resolutions.

Why it looks great

  • Unified Diffusion Core – Enhances motion smoothness and temporal consistency across frames.
  • Style-flexible generation – Switch seamlessly between realism, anime, sketch, or cinematic tones.
  • Precision color mapping – Retains natural tones and contrast even in stylized conversions.
  • Resolution scalability – Available in both 480p and 720p, optimized for balance between speed and clarity.
  • Consistent motion fidelity – Avoids flicker and deformation during high-action sequences.

Pricing

Output ResolutionPrice per 5 secondsMax Length
480p (Standard)$0.20120 s
720p (HD)$0.40120 s

How to Use

  1. Enter prompt — Describe or select the desired style for your video.
  2. Choose resolution480p or 720p.
  3. Run generation — Wait for AI rendering and preview results.
  4. Review & iterate — Fix seed for reproducibility, change seed for variation.

Pro tips for best quality

  • Keep your source video stable and clear for best transformation results.

  • Higher resolution (720p) is ideal for professional output, while 480p suits faster drafts.

Note

  • Actual render time varies with resolution and server load.

  • Videos longer than 120 s should be split into multiple segments and merged after processing.