Wan 2.6 Reference-to-Video Flash
Wan 2.6 Reference-to-Video Flash is Alibaba's fast reference-driven video generation model. Upload up to 5 reference images and describe the scene — the model generates high-quality video that preserves character identity and appearance, with optional audio generation and multi-shot support.
Why Choose This?
-
Multi-reference input
Upload up to 5 reference images for precise character and scene guidance.
-
Identity preservation
Maintains character appearance and identity across generated video frames.
-
Audio generation
Optional synchronized audio for complete video output.
-
Shot type control
Choose between single continuous shot or multi-shot composition.
-
Multiple resolutions
Support for 720p and 1080p in both landscape and portrait orientations.
-
Prompt Enhancer
Built-in tool to automatically improve your video descriptions.
Parameters
| Parameter | Required | Description |
|---|
| reference_urls | Yes | Reference images (1-5, click "+ Add Item" for multiple) |
| prompt | Yes | Text description of the video scene and motion |
| audio | No | Custom audio track (URL or upload) |
| negative_prompt | No | Elements to exclude from generation |
| size | No | Output size: 1280720, 7201280, 19201080, 10801920 |
| duration | No | Video length: 5 or 10 seconds (default: 5) |
| shot_type | No | Shot composition: single, multi (default: multi) |
| enable_audio | No | Generate synchronized audio (default: enabled) |
| enable_prompt_expansion | No | Enable prompt optimizer (default: disabled) |
| seed | No | Random seed for reproducibility (-1 for random) |
How to Use
- Upload reference images — add 1-5 character or scene references.
- Write your prompt — describe the scene, motion, and camera work.
- Upload audio (optional) — provide a custom audio track.
- Set size — choose resolution and orientation.
- Set duration — 5 or 10 seconds.
- Choose shot type — single for one continuous shot, multi for varied compositions.
- Configure audio — enable/disable audio generation.
- Run — submit and download your video.
Pricing
Pricing depends on resolution, duration, and audio settings.
| Size | Duration | Audio Off | Audio On |
|---|
| 720p | 5s | $0.25 | $0.50 |
| 720p | 10s | $0.375 | $0.75 |
| 1080p | 5s | $0.40 | $0.80 |
| 1080p | 10s | $0.60 | $1.20 |
Billing Rules
- Resolution multiplier: 720p (1280720 / 7201280) = 1×, 1080p (19201080 / 10801920) = 1.6×
- Audio multiplier: disabled = 1×, enabled = 2×
Best Use Cases
- Character Animation — Generate videos that preserve character identity from reference photos.
- Social Media Content — Create engaging videos featuring consistent characters.
- Storytelling — Produce narrative scenes with identity-consistent characters.
- Marketing & Ads — Generate promotional videos featuring specific people or characters.
- Multi-shot Production — Create videos with varied camera angles and compositions.
Pro Tips
- Use multiple reference images from different angles for better identity preservation.
- Use "multi" shot type for more dynamic, cinematic compositions.
- Disable enable_audio for faster processing when audio is not needed.
- Add negative prompts to avoid common issues (e.g., "blurry, distorted").
- Enable prompt expansion for automatic prompt optimization.
- Use 720p for drafts and testing, 1080p for final production.
Notes
- Both reference_urls and prompt are required fields.
- Maximum 5 reference images per generation.
- Duration options are 5 or 10 seconds only.
- Ensure uploaded image and audio URLs are publicly accessible.
- Seed value -1 generates a random seed each time.
- If your result don't have sound, please add prompt like "Add background sound".
More Models to Try
- vidu/reference-to-video-q2 - Vidu's Q2 reference-to-video model.
- google/veo3.1/reference-to-video - Google Veo 3.1 reference-conditioned video generator.
- kwaivgi/kling-video-o1/reference-to-video - Kwaivgi's Kling Video O1 reference-to-video model.