
Wan 2.5 - Cinematic AI Video Generation by Alibaba
Alibaba's cinematic video generation model — multi-shot coherence, native audio sync, and high-fidelity motion dynamics for image-to-video and text-to-video.
Cinematic Video from Any Input
Wan 2.5 generates high-fidelity video from images or text with natural motion dynamics, audio sync, and multi-shot coherence.
High-Fidelity Motion
Wan 2.5 produces smooth, natural motion with accurate physics simulation. From flowing fabric to complex camera movements, every frame maintains temporal consistency and visual quality.

Multi-Shot Coherence
Generate multi-shot video sequences that maintain visual continuity — consistent characters, environments, and lighting across scenes for professional storytelling.

Native Audio Sync
Built-in audio-video synchronization ensures generated content has properly timed sound effects and ambient audio that matches the visual action.

Wan 2.5 vs. Traditional Video Generation
See why teams switch from self-hosted GPU clusters to WaveSpeed's managed platform.
Enterprise-Grade Performance by Default
WaveSpeed handles millions of AI video generations per day — for solo developers and professional content teams alike.
Examples

A drone shot slowly revealing a misty mountain valley at sunrise, golden light breaking through clouds.

Close-up of a woman turning to face the camera, wind blowing through her hair, warm afternoon light.

Timelapse of clouds rolling over a vast desert landscape, shifting shadows across sand dunes.

A surfer catching a massive wave, slow motion water spray, dramatic ocean backdrop.
Integrate in Minutes
Production-ready SDKs for Python and JavaScript. REST API with full OpenAPI spec. Webhook support for async jobs.
- Image-to-video and text-to-video endpoints
- Multiple resolution and duration options
- Python & JavaScript SDKs + REST API
Get Any Tool You Want
1000+ models across image, video, audio, and 3D — all through one API.
FAQ
Wan 2.5 is Alibaba's cinematic video generation model supporting both image-to-video and text-to-video workflows. It delivers high-fidelity motion with multi-shot coherence on WaveSpeed.
Wan 2.5 supports multiple resolutions including 720p and 1080p with various aspect ratios suitable for social media, presentations, and professional content.
Video duration depends on the endpoint and settings. Standard generation produces clips of several seconds, suitable for social content and professional edits.
Yes. The image-to-video endpoint takes a reference image and animates it according to your text prompt, maintaining the visual style and subject of the input.
Wan 2.5 uses WaveSpeed's pay-per-generation pricing. Visit the pricing page for current rates and volume tiers.

