Wan 2.1 VACE | All-In-One Image-To-Video And Video Editing

Wan2.1 VACE: Three Core Capabilities Analysis

Multi-modal Information Input,Making video generation more controllable.

Traditional video generation workflows, once completed, make it difficult to adjust character postures, actions, scene transitions, and other details. Wan2.1 VACE provides powerful controllable capabilities, supporting generation based on human poses, motion flow, structural preservation, spatial movement, camera angles, and other controls, while also supporting video generation based on themes and background references.

The core technology behind this is Wan VACE's multi-modal input mechanism. Unlike traditional models that rely solely on text prompts, Wan VACE(Wan2.1 VACE) has built a unified input system that integrates text, images, videos, masks, and control signals.

For image input, Wan VACE (Wan 2.1 VACE) supports object reference images or video frames. For video input, users can use Wan VACE to regenerate content through operations such as erasing and local expansion. For local regions, users can specify editing areas through binary 0/1 signals. For control signals, Wan VACE (Wan2.1 VACE) supports depth maps, optical flow, layouts, grayscale, line drawings, and pose estimation.

Unified Single Model - One-Stop Solution for Multiple Tasks

Wan VACE (Wan2.1 VACE) supports content replacement, addition, or deletion operations in specified areas within videos. In terms of time dimension, Wan VACE can arbitrarily extend the video length at the beginning or end. In terms of spatial dimension, it supports progressive generation of backgrounds or specific regions, such as background replacement - under the premise of preserving the main subject, the background environment can be changed according to prompts.

Free Combination of Multiple Tasks - Unleashing AI Creative Boundaries

Wan VACE(Wan2.1 VACE) also supports the free combination of various single-task capabilities, breaking through the limitations of traditional expert models that work in isolation. As a unified model, it can naturally integrate capabilities such as video generation, pose control, background replacement, and local region editing. There's no need to train new models for single-function tasks separately.

WAN 2.1 VACE is an all-in-one video model supporting Reference-to-Video (Image-to-Video), V2V, Masked V2V and Move/Swap/Animate capabilities. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

ExamplesView all

README

Wan2.1 VACE: Three Core Capabilities Analysis

Multi-modal Information Input,Making video generation more controllable.

Unified Single Model - One-Stop Solution for Multiple Tasks

Free Combination of Multiple Tasks - Unleashing AI Creative Boundaries