Introducing Alibaba WAN 2.6 Reference To Video Flash on WaveSpeedAI

Alibaba WAN 2.6 Reference-to-Video Flash is Now Available on WaveSpeedAI

Speed meets consistency. WaveSpeedAI is excited to announce the launch of Alibaba WAN 2.6 Reference-to-Video Flash, the fast, distilled variant of Alibaba’s identity-preserving video generation model. If you’ve been working with reference-to-video workflows and wished the results came back faster, this model is built for you — delivering the same character consistency and multi-shot storytelling in a fraction of the generation time.

What is WAN 2.6 Reference-to-Video Flash?

WAN 2.6 Reference-to-Video Flash is the speed-optimized counterpart to the standard WAN 2.6 Reference-to-Video model. Distilled from the full-size model, it retains the core capability that makes the WAN 2.6 R2V family unique: you upload reference images of characters, props, or scenes, write a text prompt describing the video you want, and the model generates new video shots that faithfully preserve the identity and appearance of your reference subjects.

The Flash version achieves significantly faster inference — generating videos in seconds rather than minutes — while maintaining the visual quality, motion coherence, and identity preservation that define the WAN 2.6 series. It supports up to 5 reference images, 720p and 1080p output, durations of 5 or 10 seconds, and optional synchronized audio generation.

Key Features

Multi-Reference Input: Upload up to 5 reference images to guide the generation. Multiple angles and viewpoints of the same subject yield better identity preservation — a substantial upgrade over typical single-reference workflows
Identity Preservation at Speed: The Flash model maintains facial features, clothing, body proportions, and distinctive characteristics of your reference subjects across every generated frame, now with dramatically reduced wait times
Multi-Shot Composition: Choose between a single continuous shot or an automatic multi-shot composition that breaks your prompt into multiple coherent shots with smooth transitions — cinematic storytelling from a single API call
Built-In Audio Generation: Enable optional synchronized audio, including background music, ambient sounds, and Foley effects, matched to the generated video content. No post-production dubbing required
Resolution Flexibility: Generate in 720p (1280×720 or 720×1280) or 1080p (1920×1080 or 1080×1920) to match your output requirements — landscape or portrait
Prompt Expansion: A built-in prompt enhancer can automatically refine your descriptions into richer, more detailed prompts, improving generation quality without requiring expert prompt engineering

Real-World Use Cases

Create TikToks, Reels, and YouTube Shorts featuring consistent characters across multiple videos. Upload a few photos of your character or brand mascot, describe the scene, and generate on-brand content at scale. The Flash speed makes rapid iteration practical — test dozens of variations in the time the standard model produces a handful.

Marketing and Advertising Prototyping

Generate product demos, brand commercials, and campaign concepts featuring specific people or characters with consistent identity across all shots. Use the multi-shot mode to produce structured ad sequences complete with synchronized audio, cutting days of pre-production down to minutes.

Narrative Storytelling and Animation

Build short narrative sequences where characters maintain their appearance across scene changes. The multi-reference capability lets you establish multiple characters in a single generation, while multi-shot mode handles transitions and pacing automatically. Writers and storyboard artists can visualize scenes almost as fast as they can describe them.

Rapid Pre-Visualization for Film

Directors and cinematographers can pre-visualize shots and sequences using reference photos of actors and locations. The Flash model’s speed enables a live creative feedback loop — adjust the prompt, regenerate, and see the result in seconds rather than waiting through lengthy render queues.

E-Commerce and Product Videos

Transform static product photos into dynamic product videos with consistent branding. Upload product images as references, describe the desired motion and environment, and generate polished video content ready for listings and ads.

Getting Started on WaveSpeedAI

Using WAN 2.6 Reference-to-Video Flash through the WaveSpeedAI API is straightforward:

import wavespeed

output = wavespeed.run(
    "alibaba/wan-2.6/reference-to-video-flash",
    {
        "reference_urls": [
            "https://example.com/character-front.jpg",
            "https://example.com/character-side.jpg"
        ],
        "prompt": "A woman walks through a sunlit garden, turning to smile at the camera",
        "size": "1280*720",
        "duration": 5,
        "shot_type": "multi"
    },
)

print(output["outputs"][0])

Configuration Options

Parameter	Description
`reference_urls`	1-5 reference images for character and scene guidance
`prompt`	Text description of the video scene and motion
`size`	Output resolution: 720p or 1080p, landscape or portrait
`duration`	Video length: 5 or 10 seconds
`shot_type`	`single` for one continuous shot, `multi` for varied compositions
`enable_audio`	Generate synchronized audio (enabled by default)
`enable_prompt_expansion`	Auto-enhance your prompt (disabled by default)

Pricing

Resolution	Duration	Audio Off	Audio On
720p	5s	$0.25	$0.50
720p	10s	$0.375	$0.75
1080p	5s	$0.40	$0.80
1080p	10s	$0.60	$1.20

Starting at just $0.25 per video — a fraction of what comparable models charge for identity-consistent generation.

Pro Tips

Use multiple reference images from different angles for the most accurate identity preservation
Select multi shot type for cinematic, dynamic compositions with automatic transitions
Disable audio when you don’t need it — processing is faster and costs half as much
Use 720p for rapid prototyping and drafts, then switch to 1080p for final production renders
Add a negative prompt like "blurry, distorted, deformed" to sharpen output quality
If your generated video lacks sound, add phrasing like “with background ambience” to your prompt

Why WaveSpeedAI?

WaveSpeedAI provides the ideal infrastructure for WAN 2.6 Reference-to-Video Flash:

No Cold Starts: Every request begins processing immediately — no waiting for model initialization
Fast Inference: Optimized infrastructure paired with the Flash model’s distilled architecture means you get results in seconds
Affordable Pricing: Identity-consistent video generation starting at $0.25, with transparent per-generation billing
Simple REST API: Drop reference-to-video generation into any application or workflow with a single API call

Start Generating Today

Alibaba WAN 2.6 Reference-to-Video Flash brings identity-preserving video generation into real-time creative workflows. It’s the same multi-reference input, the same character consistency, and the same multi-shot storytelling — delivered at the speed your projects demand.

Whether you’re iterating on ad concepts, building a library of character-driven content, or pre-visualizing scenes for production, this model removes the wait and lets you focus on the creative work.

Try it now at wavespeed.ai/models/alibaba/wan-2.6/reference-to-video-flash.