Alibaba Wan 2.6 Reference To Video Flash
Playground
Try it on WavespeedAI!Alibaba WAN 2.6 Reference-to-Video Flash turns character, prop, or scene references from images or videos into new video shots with preserved identity, style, and layout plus smooth, coherent motion. Flash version with faster generation speed. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Features
Wan 2.6 Reference-to-Video Flash
Wan 2.6 Reference-to-Video Flash is Alibaba’s fast reference-driven video generation model. Upload up to 5 reference images and describe the scene — the model generates high-quality video that preserves character identity and appearance, with optional audio generation and multi-shot support.
Why Choose This?
-
Multi-reference input Upload up to 5 reference images for precise character and scene guidance.
-
Identity preservation Maintains character appearance and identity across generated video frames.
-
Audio generation Optional synchronized audio for complete video output.
-
Shot type control Choose between single continuous shot or multi-shot composition.
-
Multiple resolutions Support for 720p and 1080p in both landscape and portrait orientations.
-
Prompt Enhancer Built-in tool to automatically improve your video descriptions.
Parameters
| Parameter | Required | Description |
|---|---|---|
| reference_urls | Yes | Reference images (1-5, click ”+ Add Item” for multiple) |
| prompt | Yes | Text description of the video scene and motion |
| audio | No | Custom audio track (URL or upload) |
| negative_prompt | No | Elements to exclude from generation |
| size | No | Output size: 1280720, 7201280, 19201080, 10801920 |
| duration | No | Video length: 5 or 10 seconds (default: 5) |
| shot_type | No | Shot composition: single, multi (default: multi) |
| enable_audio | No | Generate synchronized audio (default: enabled) |
| enable_prompt_expansion | No | Enable prompt optimizer (default: disabled) |
| seed | No | Random seed for reproducibility (-1 for random) |
How to Use
- Upload reference images — add 1-5 character or scene references.
- Write your prompt — describe the scene, motion, and camera work.
- Upload audio (optional) — provide a custom audio track.
- Set size — choose resolution and orientation.
- Set duration — 5 or 10 seconds.
- Choose shot type — single for one continuous shot, multi for varied compositions.
- Configure audio — enable/disable audio generation.
- Run — submit and download your video.
Pricing
Pricing depends on resolution, duration, and audio settings.
| Size | Duration | Audio Off | Audio On |
|---|---|---|---|
| 720p | 5s | $0.25 | $0.50 |
| 720p | 10s | $0.375 | $0.75 |
| 1080p | 5s | $0.40 | $0.80 |
| 1080p | 10s | $0.60 | $1.20 |
Billing Rules
- Resolution multiplier: 720p (1280720 / 7201280) = 1×, 1080p (19201080 / 10801920) = 1.6×
- Audio multiplier: disabled = 1×, enabled = 2×
Best Use Cases
- Character Animation — Generate videos that preserve character identity from reference photos.
- Social Media Content — Create engaging videos featuring consistent characters.
- Storytelling — Produce narrative scenes with identity-consistent characters.
- Marketing & Ads — Generate promotional videos featuring specific people or characters.
- Multi-shot Production — Create videos with varied camera angles and compositions.
Pro Tips
- Use multiple reference images from different angles for better identity preservation.
- Use “multi” shot type for more dynamic, cinematic compositions.
- Disable enable_audio for faster processing when audio is not needed.
- Add negative prompts to avoid common issues (e.g., “blurry, distorted”).
- Enable prompt expansion for automatic prompt optimization.
- Use 720p for drafts and testing, 1080p for final production.
Notes
- Both reference_urls and prompt are required fields.
- Maximum 5 reference images per generation.
- Duration options are 5 or 10 seconds only.
- Ensure uploaded image and audio URLs are publicly accessible.
- Seed value -1 generates a random seed each time.
- If your result don’t have sound, please add prompt like “Add background sound”.
More Models to Try
- vidu/reference-to-video-q2 - Vidu’s Q2 reference-to-video model.
- google/veo3.1/reference-to-video - Google Veo 3.1 reference-conditioned video generator.
- kwaivgi/kling-video-o1/reference-to-video - Kwaivgi’s Kling Video O1 reference-to-video model.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/alibaba/wan-2.6/reference-to-video-flash" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"size": "1280*720",
"duration": 5,
"shot_type": "single",
"enable_audio": true,
"enable_prompt_expansion": false,
"seed": -1
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
| reference_urls | array | Yes | - | 1 ~ 5 items | Array of URLs to reference images or videos. Images: 0-5, Videos: 0-3, Total: ≤5. |
| prompt | string | Yes | - | The positive prompt for the generation. | |
| audio | string | No | - | - | Audio URL to guide generation (optional). |
| negative_prompt | string | No | - | The negative prompt for the generation. | |
| size | string | No | 1280*720 | 1280*720, 720*1280, 1920*1080, 1080*1920 | The size of the generated media in pixels (width*height). |
| duration | integer | No | 5 | 5, 10 | The duration of the generated media in seconds. |
| shot_type | string | No | single | single, multi | The type of shots to generate. |
| enable_audio | boolean | No | true | - | Whether to generate audio for the video. Set to false to generate video without audio. |
| enable_prompt_expansion | boolean | No | false | - | If set to true, the prompt optimizer will be enabled. |
| seed | integer | No | -1 | -1 ~ 2147483647 | The random seed to use for the generation. -1 means a random seed will be used. |
Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data.id | string | Unique identifier for the prediction, Task Id |
| data.model | string | Model ID used for the prediction |
| data.outputs | array | Array of URLs to the generated content (empty when status is not completed) |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |
Result Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| id | string | Yes | - | Task ID |
Result Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data | object | The prediction data object containing all details |
| data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
| data.model | string | Model ID used for the prediction |
| data.outputs | object | Array of URLs to the generated content (empty when status is not completed). |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |