WaveSpeed.ai
Home/Explore/Wan 2.1 Video Models/wavespeed-ai/wan-2.1-14b-vace
image-to-video

image-to-video

Wan 2.1 VACE

wavespeed-ai/wan-2.1-14b-vace

WAN 2.1 VACE is an all-in-one video model supporting Reference-to-Video (Image-to-Video), V2V, Masked V2V and Move/Swap/Animate capabilities. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Input

Hint: You can drag and drop a file or click to upload

preview

Hint: You can drag and drop a file or click to upload

Hint: You can drag and drop a file or click to upload

Hint: You can drag and drop a file or click to upload

Hint: You can drag and drop a file or click to upload

Hint: You can drag and drop a file or click to upload

Idle

Your request will cost $0.3 per run.

For $10 you can run this model approximately 33 times.

One more thing::

ExamplesView all

README

Wan 2.1 14B VACE — wavespeed-ai/wan-2.1-14b-vace

Wan 2.1 14B VACE is a versatile, production-oriented video generation and editing model that supports multi-input workflows. You can provide a text prompt plus up to 5 reference images, and optionally add a source video, masks, or start/end frames to guide motion, structure, and edits. It also includes multiple task modes (e.g., depth) for more controlled video understanding and generation.

Key capabilities

  • Prompt-driven video generation with multi-modal controls
  • Up to 5 reference images to guide identity, style, wardrobe, or scene details
  • Optional video input for video-to-video transformation workflows
  • Mask support (mask_video / mask_image) for region-based edits
  • First/last frame guidance (first_image / last_image) for better continuity
  • Task modes (e.g., depth) for structured control and more predictable results

Use cases

  • Reference-guided video generation (character/style consistency across shots)
  • Video editing with masks (replace background, remove objects, localized changes)
  • Start-to-end guided storytelling using first_image + last_image
  • Video-to-video restyling (apply a new look while keeping motion)
  • Controlled motion and composition using task settings (e.g., depth)

Pricing

ModeSizePrice per 5s video
Standard832×480$0.30
Fast Mode832×480$0.15
Standard1280×720 / 720×1280$0.40
Fast Mode1280×720 / 720×1280$0.25

Longer durations are billed in steps based on duration.

Inputs

  • prompt (required): what should happen in the video
  • images (optional): up to 5 reference images
  • video (optional): source video for video-to-video workflows
  • mask_video (optional): video mask for localized video edits
  • mask_image (optional): image mask for localized edits
  • first_image (optional): starting frame guidance
  • last_image (optional): ending frame guidance
  • negative_prompt (optional): what to avoid

Parameters

  • task: control mode selector (e.g., depth)
  • duration: video length (e.g., 5s)
  • size: output resolution (e.g., 832×480, 1280×720)
  • num_inference_steps: sampling steps
  • guidance_scale: prompt adherence strength
  • flow_shift: motion/flow behavior tuning
  • context_scale: reference/context strength tuning
  • seed: random seed (-1 for random; fixed for reproducibility)
  • enable_fast_mode: speed-optimized mode (if available in your UI)

Prompting guide (multi-reference + optional masks)

A reliable structure:

  1. Define the main subject and action
  2. Specify environment and camera beats
  3. Assign roles to references (identity/style/outfit/background)
  4. If using masks, clearly state what changes inside vs. outside the mask
  5. If using first/last frames, describe how the motion should transition between them

Template: Use image 1 for identity, image 2 for outfit, image 3 for style. Generate a 5-second clip where [action]. Keep identity consistent. If mask is provided, change only the masked region to [edit], keep everything else unchanged.

Example prompts

  • An elegant lady carefully selects bags in a boutique. Soft natural lighting, shallow depth of field, subtle camera push-in, gentle hand movements, realistic fabric and leather textures.
  • Use the reference images for the same character and outfit. Walk through a luxury store aisle, turn to examine a handbag, warm highlights on leather, calm cinematic pacing.
  • If mask is provided: Replace only the masked background with a modern boutique interior, keep the subject unchanged, match lighting and shadows.