Introducing Alibaba WAN 2.7 Reference To Video on WaveSpeedAI

Wan 2.7 Reference-to-Video: Create Character-Consistent AI Video from Multiple References

Maintaining character identity across AI-generated video clips has been one of the hardest problems in generative video — until now. Wan 2.7 Reference-to-Video from Alibaba’s Tongyi Lab solves this by letting you feed in multiple reference videos and images, then generating new scenes where characters, props, and visual styles stay perfectly consistent. Available now on WaveSpeedAI with no cold starts and affordable pay-per-use pricing, this model unlocks production-quality multi-character video generation through a simple REST API.

Whether you’re a filmmaker pre-visualizing complex scenes, a brand creating spokesperson campaigns, or a content creator building multi-shot narratives, Wan 2.7 Reference-to-Video eliminates the inconsistency problem that has plagued AI video workflows.

How Wan 2.7 Reference-to-Video Works

Wan 2.7 Reference-to-Video is built on Alibaba’s Diffusion Transformer (DiT) architecture with a Full Attention mechanism that processes spatial and temporal relationships across the entire video sequence simultaneously. This is why character identity stays stable across the full clip duration — the model doesn’t just generate frame by frame, it understands the entire sequence at once.

The workflow is straightforward:

Upload reference videos — provide one or more source videos containing the characters or visual elements you want to preserve.
Add an optional reference image — supplement with a still image for additional visual guidance.
Write your prompt — describe the new scene using natural language, referencing characters by position (e.g., “The character in Video 1 walks through a garden while Video 2 watches from a bench”).
Generate — the model produces a new video that places your referenced characters into the described scene with preserved identity, style, and coherent motion.

The model supports up to 5 combined reference inputs (videos and images together), output at 720p or 1080p resolution, aspect ratios including 16:9, and clip durations of 5, 10, or 15 seconds. A unique prompt indexing system lets you precisely control which reference appears where — videos are numbered first (Video 1, Video 2), then images continue the sequence (Image 3, Image 4).

Key Features of Wan 2.7 Reference-to-Video

Multi-video reference support — Combine characters, objects, or visual elements from multiple source videos into a single coherent scene. No other model in this class handles multi-source video references this cleanly.
Identity-locked character consistency — The Full Attention architecture preserves facial features, clothing, body proportions, and stylistic details across the generated clip without the identity drift common in older diffusion-based video models.
Prompt indexing for precise control — Reference specific characters using “Video 1,” “Video 2,” “Image 3” syntax in your prompt. This gives you director-level control over who does what in the generated scene.
Negative prompt support — Specify elements to exclude from the output, preventing unintended visual blending between reference sources.
Automatic prompt expansion — Enable prompt expansion to let the model enrich shorter prompts with additional detail, producing richer output without manual prompt engineering.
1080p output — Generate at full HD resolution for production-ready results, or use 720p for faster iteration during the creative process.
Up to 15 seconds per clip — Generate longer scenes that give characters time to move, interact, and express — enough for social media shorts and commercial cuts.

Best Use Cases for Wan 2.7 Reference-to-Video

Multi-Character Storytelling and Short Films

Place characters from separate reference videos into shared scenes they never actually filmed together. A filmmaker can shoot actors individually, then use Wan 2.7 R2V to generate interaction scenes — characters sitting together, walking side by side, or having a conversation in a new environment. This dramatically reduces production costs for indie projects and pre-visualization.

Brand Spokesperson Video Campaigns

Marketing teams can generate dozens of on-brand video variations featuring a consistent brand spokesperson or mascot. Upload a reference video of your brand character once, then generate them in different settings — in a kitchen, at an office, outdoors — while maintaining perfect visual identity throughout the campaign. No reshoots required.

Content creators can produce character-consistent short-form video at volume. Take a reference video of a recurring character or persona, describe new scenarios, and generate fresh content daily. The identity preservation ensures your audience recognizes the character across every post, building brand consistency without the production overhead.

Product Demos and Explainer Videos

Combine a reference video of a presenter with product imagery to generate polished demo videos. The presenter maintains their appearance and style while interacting with products in new contexts — perfect for e-commerce listings, product launches, and tutorial content.

Creative Concepting and Storyboarding

Directors and creative teams can rapidly prototype multi-character scenes before committing to full production. Generate 10 variations of a scene with different staging, lighting, or character interactions in minutes. Use 720p for fast iteration, then render the winning concept at 1080p.

Fan Content and Character Crossovers

Combine visual elements from different sources into a single coherent scene. Characters from different reference videos can interact naturally, opening up creative possibilities for fan art, mashups, and experimental visual storytelling.

Training and Educational Content

Generate consistent instructor-led video content across multiple lessons. Upload a reference of the instructor once, then produce them in different educational settings — at a whiteboard, in a lab, in the field — maintaining visual continuity across an entire course series.

Wan 2.7 Reference-to-Video Pricing and API Access

WaveSpeedAI offers Wan 2.7 Reference-to-Video with straightforward per-generation pricing:

Duration	720p	1080p
5 seconds	$1.00	$1.60
10 seconds	$1.50	$2.40
15 seconds	$2.00	$3.20

1080p renders cost 1.6× the 720p rate. Pricing includes a fixed overhead for reference video processing.

Getting started takes minutes. Install the WaveSpeed SDK and make your first API call:

import wavespeed

output = wavespeed.run(
    "alibaba/wan-2.7/reference-to-video",
    {
        "prompt": "The character in Video 1 walks through a sunlit garden, smiling and looking at the flowers",
        "videos": ["https://example.com/reference-video.mp4"],
        "resolution": "720p",
        "duration": 5,
    },
)

print(output["outputs"][0])

WaveSpeedAI runs Wan 2.7 Reference-to-Video with no cold starts — your first request is as fast as your hundredth. No GPU provisioning delays, no idle compute charges. You pay only for what you generate.

Try Wan 2.7 Reference-to-Video now →

Tips for Best Results with Wan 2.7 Reference-to-Video

Use clear, distinct reference videos. The more visually distinct each reference video is, the better the model preserves each character’s identity in the output. Avoid references with similar-looking subjects.
Reference characters by index in your prompt. Always use “Video 1,” “Video 2,” etc. to specify which character does what. The numbering follows upload order for videos, then continues for reference images.
Start with 720p for iteration. Test your scene composition, prompt phrasing, and character positioning at 720p before committing to a 1080p final render. This saves both time and cost.
Use negative prompts to prevent blending. If you notice visual styles bleeding between reference sources, add a negative prompt to exclude specific unwanted elements.
Enable prompt expansion for short prompts. If your prompt is brief or lacks scene detail, turning on prompt expansion lets the model fill in cinematic details automatically.
Keep reference videos short and focused. Reference clips that clearly feature the subject you want to preserve will produce better identity consistency than long, varied footage.

Frequently Asked Questions About Wan 2.7 Reference-to-Video

What is Wan 2.7 Reference-to-Video?

Wan 2.7 Reference-to-Video is an AI video generation model from Alibaba that creates new video scenes while preserving the identity, appearance, and style of characters from your reference videos and images.

How much does Wan 2.7 Reference-to-Video cost?

Pricing starts at $1.00 per 5-second clip at 720p, scaling up to $3.20 for a 15-second 1080p video. There are no subscription fees — you pay per generation on WaveSpeedAI.

Can I use Wan 2.7 Reference-to-Video via API?

Yes. Wan 2.7 Reference-to-Video is available as a REST API on WaveSpeedAI with no cold starts, pay-per-use pricing, and the WaveSpeed Python SDK for easy integration.

How many reference videos can I use at once?

You can provide up to 5 combined reference inputs (videos and images together). Each reference is numbered sequentially in your prompt for precise control over which character appears where.

How is Wan 2.7 Reference-to-Video different from Wan 2.7 Image-to-Video?

Wan 2.7 Image-to-Video animates a single reference image into video. Reference-to-Video accepts multiple video references, preserving identity across sources and enabling multi-character scenes with consistent identity — a fundamentally different capability for production workflows.

Start Creating Character-Consistent Video with Wan 2.7

Wan 2.7 Reference-to-Video brings a capability that was previously impossible in AI video generation: reliable multi-character identity preservation from video references. Combined with WaveSpeedAI’s instant inference and simple API, it’s ready for production workflows today.

Explore the full Wan 2.7 suite on WaveSpeedAI — including Text-to-Video, Image-to-Video, Video Edit, and Video Extend.

Try Wan 2.7 Reference-to-Video on WaveSpeedAI →