Wan 2.5: Make longer, steadier AI videos
for half the cost.

Wan 2.5: Make longer, steadier AI videos for half the cost.

Wan 2.5 is a state-of-art AI image, video model by WaveSpeedAI.

Have a try

Text to Video
Image to Video
Text to Image
T to V(Fast)
I to V(Fast)
Prompt
Audio

Click to upload a audio

Create

Key Features

One prompt, audio and video match all the way through

With Wan 2.5, you no longer need to record separate voiceovers or manually align lips for silent AI videos. Just give a clear, well-structured prompt to generate a complete video with audio/voiceover and lip-sync all at once. The process becomes faster and simpler.

Get started

Prompt

A young man sits still on a subway train, surrounded by blurred figures moving rapidly. [Close-up] His eyes, barely blinking, intensify the sense of loneliness.

Final outcome

More affordable

Although Google recently announced price cuts, Veo 3 still remains costly overall. In contrast, Wan 2.5 is leaner and more budget-friendly, offering creators more options while significantly reducing production costs.

Get started

Wan 2.5

1080p / 10s / $1.5
720p / 10s / $1
480p / 10s / $1

VEO 3

1080p / 8s / $3.2
720p / 8s / $3.2
/

Smooth & Stable Motion

With a wide dynamic range, Wan 2.5 makes big movements as smooth as small ones and keeps motion stable and realistic.

Get started

Wan 2.5

VEO 3

Prompt: A man is surfing.

Multilingual and accent friendly

When prompts are in Chinese or Minor languages, Wan 2.5 reliably produces A/V-synchronized videos. Compared to Veo 3, it often displays “unknown language” when the prompt includes Chinese or other languages.

Get started

Wan 2.5

VEO 3

Prompt: A confident woman in her 40s stands on a stage with a microphone. The background shows a large LED screen with abstract visuals. She smiles and begins speaking to the audience in cockney: “Good evening everyone. Can I have a bottle of water” Her lip movements match her voice, and she uses expressive hand gestures while speaking.

Voice-driven reference & original sound video

Veo 3 does not support audio reference, limiting creators to silent clips or system-generated sound. In contrast, Wan 2.5 allows direct input of voice, sound effects, and background music, driving the video generation with precise audio cues.

Get started

Using cases

3D Animation: Create a short 3D animated scene in a cheerful cartoon style. A cute creature, with fur like a snow leopard, large expressive eyes, and a round, friendly physique, frolics through a whimsical winter forest. The scene should feature rounded snow-covered trees, gently falling snowflakes, and warm sunlight filtering through the branches. The creature's lively movements and beaming smile should convey pure joy. Adopt a cheerful and heartwarming tone, with bright, playful colors and fun animation.

2D Animation: A cute magical girl with pink twin-tails is undergoing a brilliant transformation sequence. She is surrounded by shimmering starlight and floating ribbons as her clothes magically dissolve into a detailed battle dress. A close-up shot focuses on her determined, large blue eyes. The background is a fantastical starry sky. Japanese anime style, vibrant colors, magical particle effects, dynamic motion, a mix of Studio Ghibli and Makoto Shinkai art styles.

ASMR Videos: A keyboard whose keys are made of different types of candy. Typing makes sweet, crunchy sounds. Audio: Crunchy, sugary typing sounds, delighted giggles.

Movie Opening: A cinematic opening sequence of a sci-fi movie: a spaceship travels across the galaxy, and the movie title "ギャラクティック・オデッセイ" emerges in golden 3D letters, with flawless kerning and no distortion, floating stably in space as the camera rotates.

Sport shots: A man is surfing.

Speech: A confident woman in her 40s stands on a stage with a microphone. The background shows a large LED screen with abstract visuals. She smiles and begins speaking to the audience: “Good evening everyone. Tonight, I want to share three powerful lessons about leadership and innovation.” Her lip movements match her voice, and she uses expressive hand gestures while speaking.

Articles about Wan 2.5

Q & A

Can I animate an existing silent video?
Yes. Video-to-video maps lip-sync and expressions onto a silent clip while preserving identity and scene context.
What’s the maximum duration?
Up to 10 minutes per generation.
How well do you handle multiple languages and dialects?
Multiple languages and various dialects are supported and can be mixed within one clip. Heads-up: Rapid switching within the same clip can reduce alignment stability.
Do you support uploading audio?
Yes. Wan 2.5 supports uploading a voice track to drive lip-sync and pacing.
Seedream 4.0