WAN 2.5 on DashScope: convert text or images into lip-synced HD videos (480p/720p/1080p) in one step — faster and more budget-friendly than Veo 3, perfect for quick, audio-embedded content. Video generation is available for durations between 3s and 10s, with flexible options for each selection.
Model Lineup
- wan-2.5/text-to-video
- wan-2.5/image-to-video
- wan-2.5/text-to-video-fast
- wan-2.5/image-to-video-fast
- wan-2.5/text-to-image
- wan-2.5/image-edit
- wan-2.5/video-extend
- wan-2.5/video-extend-fast
Why Wan 2.5?
- More affordable — Lower overall cost than Veo 3; efficient for batch production.
- One-pass A/V sync — Generate video with voiceover + lip-sync in a single run—no separate VO or manual alignment.
- Multilingual that works — Reliable A/V sync for Chinese and minor languages (Veo 3 often shows “unknown language”).
- Longer more flexible — Up to 10 seconds (vs. ~8 seconds on Veo 3) and three aspect ratios for different platforms.
- Audio-driven control — Use voice/SFX/BGM as references to guide generation (Veo 3 doesn’t support audio references).
See WAN 2.5 vs. Veo 3
Veo3 VS Wan 2.5 effect compare
Great for
- Shorts — 3–10s hooks for TikTok/Reels. e.g., “Dynamic city night shot, upbeat VO summarizing three tips.”
- Ads & E-commerce — Product hero shots + CTA. e.g., “Rotate sneaker, macro textures, VO: ‘Lightweight, all-day comfort.’”
- Explainers/Tutorials — Step-by-step with on-beat VO. e.g., “3-step setup, captions auto-timed to narration.”