Wan 2.1 14b Vace

Playground

WAN 2.1 VACE is an all-in-one video model supporting Reference-to-Video (Image-to-Video), V2V, Masked V2V and Move/Swap/Animate capabilities. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

Wan 2.1 14B VACE — wavespeed-ai/wan-2.1-14b-vace

Wan 2.1 14B VACE is a versatile, production-oriented video generation and editing model that supports multi-input workflows. You can provide a text prompt plus up to 5 reference images, and optionally add a source video, masks, or start/end frames to guide motion, structure, and edits. It also includes multiple task modes (e.g., depth) for more controlled video understanding and generation.

Key capabilities

Prompt-driven video generation with multi-modal controls
Up to 5 reference images to guide identity, style, wardrobe, or scene details
Optional video input for video-to-video transformation workflows
Mask support (mask_video / mask_image) for region-based edits
First/last frame guidance (first_image / last_image) for better continuity
Task modes (e.g., depth) for structured control and more predictable results

Use cases

Reference-guided video generation (character/style consistency across shots)
Video editing with masks (replace background, remove objects, localized changes)
Start-to-end guided storytelling using first_image + last_image
Video-to-video restyling (apply a new look while keeping motion)
Controlled motion and composition using task settings (e.g., depth)

Pricing

Mode	Size	Price per 5s video
Standard	832×480	$0.30
Fast Mode	832×480	$0.15
Standard	1280×720 / 720×1280	$0.40
Fast Mode	1280×720 / 720×1280	$0.25

Longer durations are billed in steps based on duration.

Inputs

prompt (required): what should happen in the video
images (optional): up to 5 reference images
video (optional): source video for video-to-video workflows
mask_video (optional): video mask for localized video edits
mask_image (optional): image mask for localized edits
first_image (optional): starting frame guidance
last_image (optional): ending frame guidance
negative_prompt (optional): what to avoid

Parameters

task: control mode selector (e.g., depth)
duration: video length (e.g., 5s)
size: output resolution (e.g., 832×480, 1280×720)
num_inference_steps: sampling steps
guidance_scale: prompt adherence strength
flow_shift: motion/flow behavior tuning
context_scale: reference/context strength tuning
seed: random seed (-1 for random; fixed for reproducibility)
enable_fast_mode: speed-optimized mode (if available in your UI)

Prompting guide (multi-reference + optional masks)

A reliable structure:

Define the main subject and action
Specify environment and camera beats
Assign roles to references (identity/style/outfit/background)
If using masks, clearly state what changes inside vs. outside the mask
If using first/last frames, describe how the motion should transition between them

Template: Use image 1 for identity, image 2 for outfit, image 3 for style. Generate a 5-second clip where [action]. Keep identity consistent. If mask is provided, change only the masked region to [edit], keep everything else unchanged.

Example prompts

An elegant lady carefully selects bags in a boutique. Soft natural lighting, shallow depth of field, subtle camera push-in, gentle hand movements, realistic fabric and leather textures.
Use the reference images for the same character and outfit. Walk through a luxury store aisle, turn to examine a handbag, warm highlights on leather, calm cinematic pacing.
If mask is provided: Replace only the masked background with a modern boutique interior, keep the subject unchanged, match lighting and shadows.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/wan-2.1-14b-vace" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "task": "depth",
    "duration": 5,
    "size": "832*480",
    "num_inference_steps": 30,
    "guidance_scale": 5,
    "flow_shift": 16,
    "context_scale": 1,
    "seed": -1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
prompt	string	Yes		-
images	array	No	[]	-	URL of ref images to use while generating the video.
video	string	No		-	The video for generating the output.
task	string	No	depth	depth, pose, face, inpainting, none	Extract control information from the provided video to guide video generation.
negative_prompt	string	No		-	The negative prompt for the generation.
mask_video	string	No	-	-	URL of the mask video.
mask_image	string	No		-	URL of the mask image.
first_image	string	No	-	-	URL of the first image.
last_image	string	No	-	-	URL of the last image.
duration	integer	No	5	5 ~ 10	The duration of the generated media in seconds.
size	string	No	832*480	832480, 480832, 1280720, 7201280	The size of the generated media in pixels (width*height).
num_inference_steps	integer	No	30	1 ~ 40	The number of inference steps to perform.
guidance_scale	number	No	5	0.0 ~ 20.0	The guidance scale to use for the generation.
flow_shift	number	No	16	0.0 ~ 30.0	The shift value for the timestep schedule for flow matching.
context_scale	number	No	1	0.0 ~ 2.0	Controls how close you want the model to stick to the reference context.
seed	integer	No	-1	-1 ~ 2147483647	The random seed to use for the generation. -1 means a random seed will be used.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction, the ID of the prediction to get
data.model	string	Model ID used for the prediction
data.outputs	string	Array of URLs to the generated content (empty when status is not completed).
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Wan 2.1 14b LoRA Trainer Wan 2.1 Ditto