WaveSpeedAI Desktop is Available Now!Try it
Home/Explore/Wan 2.6 Models/alibaba/wan-2.6/reference-to-video
reference-to-video

reference-to-video

Alibaba WAN 2.6

alibaba/wan-2.6/reference-to-video

Alibaba WAN 2.6 Reference-to-Video turns character, prop, or scene references—single or multi-view—into new video shots with preserved identity, style, and layout plus smooth, coherent motion. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

Hint: You can drag and drop a file or click to upload

Hint: You can drag and drop a file or click to upload

Hint: You can drag and drop a file or click to upload

If set to true, the prompt optimizer will be enabled.

Idle

Your request will cost $0.5 per run.

For $10 you can run this model approximately 20 times.

One more thing::

ExamplesView all

README

Alibaba / WAN 2.6 — Reference-to-Video (wan2.6-ref2v)

WAN 2.6 Reference-to-Video is Alibaba’s WanXiang 2.6 model for turning example videos + a text prompt into new shots. Provide up to two reference clips and the model learns their style, motion, and framing, then generates a new 5–10s video at up to 1080p.

🚀 Highlights

  • Reference-driven motion & style – Mimic camera moves, pacing and composition from your reference videos while following your prompt.
  • Up to two reference videos – Blend style from one clip and motion from another, or use different angles of the same scene.
  • Cinematic resolutions – Choose from 720p, or 1080p (portrait or landscape).
  • Story-aware generation – Works with prompt expansion and multishots to build richer, multi-shot sequences.
  • Audio-ready pipeline – Optional audio field for workflows that need motion aligned to external sound.

Output format: MP4 video at the selected size and duration.

🧩 Parameters

  • prompt* Text description of the new scene: characters, actions, environment, camera motion, mood, style, etc.

  • videos* 1–2 reference clips (URLs or uploads). These guide style, camera work, pacing, and motion structure.

  • negative_prompt Things to avoid, e.g. watermark, text, distortion, extra limbs.

  • audio (optional) External audio track for advanced pipelines where timing should loosely follow a given soundtrack. For most use cases you can leave this empty.

  • size One of the following resolution presets:

    • 1280×720 or 720×1280 → 720p
    • 1920×1080 or 1080×1920 → 1080p
  • duration Video length: 5 s or 10 s.

  • shot_type

    • single – Single-shot clip.
    • multi – When combined with enable_prompt_expansion, WAN 2.6 can break your idea into multiple shots of the same scene.
  • enable_prompt_expansion If enabled, Alibaba’s prompt optimizer expands short prompts into a richer internal script before generation.

  • seed Random seed. Set -1 for a new random result each time, or fix to a specific integer for reproducible layout and motion.

💰 Pricing

ResolutionSizes (W×H)5 s10 s
720p1280×720 / 720×1280$1.00$1.50
1080p1920×1080 / 1080×1920$1.50$2.25

✅ How to Use

  1. Prepare 1–2 reference videos

    • Clean motion, stable framing, and clear style work best.
    • You can use two angles of the same scene, or two stylistically similar clips.
  2. Write your prompt

    • Describe what should happen in the new video, not just what’s in the references.
    • Example: “Cyberpunk alley at night, hero walking toward camera, slow dolly-in, neon reflections on wet ground, cinematic color grading.”
  3. (Optional) Add a negative_prompt

    • Keep it short and focused: watermark, text, logo, extra limbs, low resolution.
  4. Choose size and duration

    • 720p/1080p according to your platform (Reels, TikTok, YouTube, etc.).
    • 5 s for quick shots, 10 s for more complex actions.
  5. Configure multishots & prompt expansion

    • Turn on enable_prompt_expansion for shorter prompts.
    • Enable multishots if you want WAN 2.6 to create a multi-shot sequence.
  6. Set seed (optional)

    • Use a fixed seed to iterate while keeping composition similar.
  7. Run the model and download the generated clip.

💡 Prompt & Reference Tips

  • Keep reference content and prompt aligned – if references show a city night scene, avoid asking for a sunny beach.

  • Use two references when you want to mix:

    • video A’s camera & motion + video B’s lighting/style.
  • Mention where you want the model to follow reference closely, e.g.: “Follow reference camera speed and angles, but change character outfit to futuristic armor.”

  • For portrait/vertical social content, select 480×832, 720×1280, or 1080×1920; for YouTube-style landscape, use the corresponding wide resolutions.

More Models to Try

  • vidu/reference-to-video-q2 Vidu’s Q2 reference-to-video model for turning style and motion from example clips into new shots, ideal for anime-style edits, trailers, and storyboards.

  • google/veo3.1/reference-to-video Google Veo 3.1 reference-conditioned video generator, designed for high-fidelity cinematic motion that closely follows your reference footage.

  • kwaivgi/kling-video-o1/reference-to-video Kwaivgi’s Kling Video O1 reference-to-video model, great for copying camera language and pacing from a sample clip while changing characters or scenes.

  • bytedance/seedance-v1-lite/reference-to-video ByteDance SeeDance v1 Lite, a lightweight reference-to-video model for fast, style-consistent generations based on short example videos.