Minimax Music 2.5Vidu Contest
Explore/Wan 2.5 Models

Wan 2.5 Models

alibaba/wan-2.5/text-to-video

alibaba

wan-2.5/text-to-video

alibaba/wan-2.5/image-edit
alibaba/wan-2.5/image-edit

alibaba

wan-2.5/image-edit

alibaba/wan-2.5/video-extend

alibaba

wan-2.5/video-extend

alibaba/wan-2.5/text-to-image
alibaba/wan-2.5/text-to-image

alibaba

wan-2.5/text-to-image

alibaba/wan-2.5/image-to-video

alibaba

wan-2.5/image-to-video

WAN 2.5 on DashScope: convert text or images into lip-synced HD videos (480p/720p/1080p) in one step — faster and more budget-friendly than Veo 3, perfect for quick, audio-embedded content. Video generation is available for durations between 3s and 10s, with flexible options for each selection.

Model Lineup

  1. wan-2.5/text-to-video
  2. wan-2.5/image-to-video
  3. wan-2.5/text-to-image
  4. wan-2.5/image-edit
  5. wan-2.5/video-extend

Why Wan 2.5?

  1. More affordable — Lower overall cost than Veo 3; efficient for batch production.
  2. One-pass A/V sync — Generate video with voiceover + lip-sync in a single run—no separate VO or manual alignment.
  3. Multilingual that works — Reliable A/V sync for Chinese and minor languages (Veo 3 often shows “unknown language”).
  4. Longer more flexible — Up to 10 seconds (vs. ~8 seconds on Veo 3) and three aspect ratios for different platforms.
  5. Audio-driven control — Use voice/SFX/BGM as references to guide generation (Veo 3 doesn’t support audio references).

See WAN 2.5 vs. Veo 3

Veo3 VS Wan 2.5 effect compare

Great for

  1. Shorts — 3–10s hooks for TikTok/Reels. e.g., “Dynamic city night shot, upbeat VO summarizing three tips.”
  2. Ads & E-commerce — Product hero shots + CTA. e.g., “Rotate sneaker, macro textures, VO: ‘Lightweight, all-day comfort.’”
  3. Explainers/Tutorials — Step-by-step with on-beat VO. e.g., “3-step setup, captions auto-timed to narration.”