Wan 2.1 14b Vace

Wan 2.1 14b Vace

Playground

Try it on WavespeedAI!

WAN 2.1 VACE is an all-in-one video model supporting Reference-to-Video (Image-to-Video), V2V, Masked V2V and Move/Swap/Animate capabilities. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

Wan 2.1 14B VACE — wavespeed-ai/wan-2.1-14b-vace

Wan 2.1 14B VACE is a versatile, production-oriented video generation and editing model that supports multi-input workflows. You can provide a text prompt plus up to 5 reference images, and optionally add a source video, masks, or start/end frames to guide motion, structure, and edits. It also includes multiple task modes (e.g., depth) for more controlled video understanding and generation.

Key capabilities

  • Prompt-driven video generation with multi-modal controls
  • Up to 5 reference images to guide identity, style, wardrobe, or scene details
  • Optional video input for video-to-video transformation workflows
  • Mask support (mask_video / mask_image) for region-based edits
  • First/last frame guidance (first_image / last_image) for better continuity
  • Task modes (e.g., depth) for structured control and more predictable results

Use cases

  • Reference-guided video generation (character/style consistency across shots)
  • Video editing with masks (replace background, remove objects, localized changes)
  • Start-to-end guided storytelling using first_image + last_image
  • Video-to-video restyling (apply a new look while keeping motion)
  • Controlled motion and composition using task settings (e.g., depth)

Pricing

ModeSizePrice per 5s video
Standard832×480$0.30
Fast Mode832×480$0.15
Standard1280×720 / 720×1280$0.40
Fast Mode1280×720 / 720×1280$0.25

Longer durations are billed in steps based on duration.

Inputs

  • prompt (required): what should happen in the video
  • images (optional): up to 5 reference images
  • video (optional): source video for video-to-video workflows
  • mask_video (optional): video mask for localized video edits
  • mask_image (optional): image mask for localized edits
  • first_image (optional): starting frame guidance
  • last_image (optional): ending frame guidance
  • negative_prompt (optional): what to avoid

Parameters

  • task: control mode selector (e.g., depth)
  • duration: video length (e.g., 5s)
  • size: output resolution (e.g., 832×480, 1280×720)
  • num_inference_steps: sampling steps
  • guidance_scale: prompt adherence strength
  • flow_shift: motion/flow behavior tuning
  • context_scale: reference/context strength tuning
  • seed: random seed (-1 for random; fixed for reproducibility)
  • enable_fast_mode: speed-optimized mode (if available in your UI)

Prompting guide (multi-reference + optional masks)

A reliable structure:

  1. Define the main subject and action
  2. Specify environment and camera beats
  3. Assign roles to references (identity/style/outfit/background)
  4. If using masks, clearly state what changes inside vs. outside the mask
  5. If using first/last frames, describe how the motion should transition between them

Template: Use image 1 for identity, image 2 for outfit, image 3 for style. Generate a 5-second clip where [action]. Keep identity consistent. If mask is provided, change only the masked region to [edit], keep everything else unchanged.

Example prompts

  • An elegant lady carefully selects bags in a boutique. Soft natural lighting, shallow depth of field, subtle camera push-in, gentle hand movements, realistic fabric and leather textures.
  • Use the reference images for the same character and outfit. Walk through a luxury store aisle, turn to examine a handbag, warm highlights on leather, calm cinematic pacing.
  • If mask is provided: Replace only the masked background with a modern boutique interior, keep the subject unchanged, match lighting and shadows.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/wan-2.1-14b-vace" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "task": "depth",
    "duration": 5,
    "size": "832*480",
    "num_inference_steps": 30,
    "guidance_scale": 5,
    "flow_shift": 16,
    "context_scale": 1,
    "seed": -1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
promptstringYes-
imagesarrayNo[]-URL of ref images to use while generating the video.
videostringNo-The video for generating the output.
taskstringNodepthdepth, pose, face, inpainting, noneExtract control information from the provided video to guide video generation.
negative_promptstringNo-The negative prompt for the generation.
mask_videostringNo--URL of the mask video.
mask_imagestringNo-URL of the mask image.
first_imagestringNo--URL of the first image.
last_imagestringNo--URL of the last image.
durationintegerNo55 ~ 10The duration of the generated media in seconds.
sizestringNo832*480832*480, 480*832, 1280*720, 720*1280The size of the generated media in pixels (width*height).
num_inference_stepsintegerNo301 ~ 40The number of inference steps to perform.
guidance_scalenumberNo50.0 ~ 20.0The guidance scale to use for the generation.
flow_shiftnumberNo160.0 ~ 30.0The shift value for the timestep schedule for flow matching.
context_scalenumberNo10.0 ~ 2.0Controls how close you want the model to stick to the reference context.
seedintegerNo-1-1 ~ 2147483647The random seed to use for the generation. -1 means a random seed will be used.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

ParameterTypeRequiredDefaultDescription
idstringYes-Task ID

Result Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
dataobjectThe prediction data object containing all details
data.idstringUnique identifier for the prediction, the ID of the prediction to get
data.modelstringModel ID used for the prediction
data.outputsstringArray of URLs to the generated content (empty when status is not completed).
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds
© 2025 WaveSpeedAI. All rights reserved.