Introducing Alibaba WAN 2.6 Reference To Video on WaveSpeedAI

Alibaba WAN 2.6 Reference-to-Video is Now Available on WaveSpeedAI

The AI video generation landscape just reached a new milestone. WaveSpeedAI is thrilled to announce the availability of Alibaba WAN 2.6 Reference-to-Video, a groundbreaking model that transforms how creators work with character identity, style consistency, and cinematic storytelling. Unveiled by Alibaba on December 16, 2025, this model represents a significant leap forward in reference-driven video generation.

What is WAN 2.6 Reference-to-Video?

WAN 2.6 Reference-to-Video (R2V) is Alibaba’s WanXiang 2.6 model designed specifically for turning example videos and text prompts into new, professionally-crafted video shots. The technology allows you to provide up to two reference clips, from which the model learns style, motion patterns, camera work, and framing—then generates entirely new 5-10 second videos at resolutions up to 1080p.

What makes this model truly revolutionary is its ability to preserve identity across generations. Whether you’re working with characters, props, or entire scenes, WAN 2.6 R2V maintains visual consistency while enabling creative transformation. This is China’s first reference-to-video generation model with multimodal reference capabilities, making it possible to insert subjects into AI-generated scenes with consistent visuals and audio.

Key Features

Reference-Driven Generation: Upload 1-2 reference clips and the model captures their essence—camera movements, pacing, composition, and visual style—while following your creative direction through text prompts
Identity Preservation: Maintain consistent character appearance, voice characteristics, and visual identity across generated shots, solving one of AI video’s most persistent challenges
Cinematic Resolutions: Generate content at 720p (1280×720 or 720×1280) or 1080p (1920×1080 or 1080×1920), suitable for YouTube, TikTok, Instagram Reels, and professional productions
Multi-Shot Storytelling: Enable intelligent storyboarding with the multi-shot mode, allowing the model to break your prompt into multiple coherent shots with smooth transitions
Audio-Ready Pipeline: Optional audio field supports workflows where motion should align with external soundtracks, enabling synchronized audio-visual experiences
Prompt Expansion: Alibaba’s built-in prompt optimizer transforms brief descriptions into rich internal scripts, enhancing generation quality without requiring expert-level prompt engineering
Flexible Duration Control: Choose between 5-second quick shots or 10-second extended sequences for more complex actions and narratives

Real-World Use Cases

Film and Video Production

Rapidly generate storyboards, pre-visualization sequences, or production-quality VFX shots. Transfer the camera language and pacing from your reference footage while introducing new characters or transforming scenes entirely.

Create narrative videos with speaking characters, drastically reducing shooting costs. Generate product videos, unboxing sequences, and brand commercials that would be impossible or expensive to shoot traditionally.

Marketing and Advertising

Produce photorealistic product demos and creative prototypes. Maintain brand consistency across multiple generated assets while exploring creative variations.

Education and Training

Generate virtual instructors and interactive learning content with consistent character presence, enabling engaging educational materials at scale.

Style Transfer and Creative Exploration

Use one reference for camera work and motion, another for lighting and visual style. Experiment with mixing stylistic elements across different source materials to create unique visual signatures.

How WAN 2.6 Compares

In recent industry comparisons, WAN 2.6 has demonstrated particular strength in character consistency and lip synchronization—keeping identity stable across frames while matching mouth movements precisely to speech. While competitors like Sora 2 excel in environmental consistency and physics modeling, WAN 2.6 prioritizes the actors and their performance, making it an intuitive creative partner for character-focused content.

The model supports both English and Chinese prompts with strong language understanding, accurately parsing complex scripts to render detail-rich scenes and performances. Its native multi-modal architecture understands storyboard instructions at a deep level, enabling “AI Director” capabilities that put professional-grade production within reach.

Getting Started on WaveSpeedAI

Using WAN 2.6 Reference-to-Video on WaveSpeedAI is straightforward:

Prepare Your References: Upload 1-2 reference videos with clean motion, stable framing, and clear visual style. Multiple angles of the same scene or stylistically similar clips work best.
Craft Your Prompt: Describe what should happen in the new video—characters, actions, environment, camera motion, mood, and style. Focus on the new scene, not just what’s in your references.
Configure Settings: Select your resolution (720p or 1080p), duration (5s or 10s), and enable multi-shot mode or prompt expansion as needed.
Generate: Submit your request and receive your video. Use fixed seeds to iterate on composition while maintaining consistent results.

Pricing

Resolution	5 seconds	10 seconds
720p	$1.00	$1.50
1080p	$1.50	$2.25

Access the model directly at: https://wavespeed.ai/models/alibaba/wan-2.6/reference-to-video

Why WaveSpeedAI?

WaveSpeedAI provides the infrastructure to run WAN 2.6 Reference-to-Video with optimal performance:

No Cold Starts: Your requests begin processing immediately without waiting for model initialization
Fast Inference: Optimized infrastructure delivers results quickly, enabling rapid iteration on creative projects
Affordable Pricing: Access cutting-edge AI video generation at competitive rates, making professional-quality content accessible to creators of all sizes
Simple REST API: Integrate reference-to-video generation directly into your workflows and applications

Start Creating Today

Alibaba WAN 2.6 Reference-to-Video represents a fundamental shift in AI video generation—from isolated frame creation to coherent, identity-preserving storytelling. Whether you’re a filmmaker pre-visualizing scenes, a content creator building your personal brand, or a marketing team producing campaign assets, this model provides the creative control and consistency that professional work demands.

The future of video creation is here. Visit WaveSpeedAI to start generating reference-driven videos with preserved identity, style, and cinematic quality.