Alibaba Wan 2.6 Reference To Video Flash

Playground

Alibaba WAN 2.6 Reference-to-Video Flash turns character, prop, or scene references from images or videos into new video shots with preserved identity, style, and layout plus smooth, coherent motion. Flash version with faster generation speed. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

Wan 2.6 Reference-to-Video Flash

Wan 2.6 Reference-to-Video Flash is Alibaba’s fast reference-driven video generation model. Upload up to 5 reference images and describe the scene — the model generates high-quality video that preserves character identity and appearance, with optional audio generation and multi-shot support.

Why Choose This?

Multi-reference input Upload up to 5 reference images for precise character and scene guidance.
Identity preservation Maintains character appearance and identity across generated video frames.
Audio generation Optional synchronized audio for complete video output.
Shot type control Choose between single continuous shot or multi-shot composition.
Multiple resolutions Support for 720p and 1080p in both landscape and portrait orientations.
Prompt Enhancer Built-in tool to automatically improve your video descriptions.

Parameters

Parameter	Required	Description
reference_urls	Yes	Reference images (1-5, click ”+ Add Item” for multiple)
prompt	Yes	Text description of the video scene and motion
audio	No	Custom audio track (URL or upload)
negative_prompt	No	Elements to exclude from generation
size	No	Output size: 1280720, 7201280, 19201080, 10801920
duration	No	Video length: 5 or 10 seconds (default: 5)
shot_type	No	Shot composition: single, multi (default: multi)
enable_audio	No	Generate synchronized audio (default: enabled)
enable_prompt_expansion	No	Enable prompt optimizer (default: disabled)
seed	No	Random seed for reproducibility (-1 for random)

How to Use

Upload reference images — add 1-5 character or scene references.
Write your prompt — describe the scene, motion, and camera work.
Upload audio (optional) — provide a custom audio track.
Set size — choose resolution and orientation.
Set duration — 5 or 10 seconds.
Choose shot type — single for one continuous shot, multi for varied compositions.
Configure audio — enable/disable audio generation.
Run — submit and download your video.

Pricing

Pricing depends on resolution, duration, and audio settings.

Size	Duration	Audio Off	Audio On
720p	5s	$0.25	$0.50
720p	10s	$0.375	$0.75
1080p	5s	$0.40	$0.80
1080p	10s	$0.60	$1.20

Billing Rules

Resolution multiplier: 720p (1280720 / 7201280) = 1×, 1080p (19201080 / 10801920) = 1.6×
Audio multiplier: disabled = 1×, enabled = 2×

Best Use Cases

Character Animation — Generate videos that preserve character identity from reference photos.
Social Media Content — Create engaging videos featuring consistent characters.
Storytelling — Produce narrative scenes with identity-consistent characters.
Marketing & Ads — Generate promotional videos featuring specific people or characters.
Multi-shot Production — Create videos with varied camera angles and compositions.

Pro Tips

Use multiple reference images from different angles for better identity preservation.
Use “multi” shot type for more dynamic, cinematic compositions.
Disable enable_audio for faster processing when audio is not needed.
Add negative prompts to avoid common issues (e.g., “blurry, distorted”).
Enable prompt expansion for automatic prompt optimization.
Use 720p for drafts and testing, 1080p for final production.

Notes

Both reference_urls and prompt are required fields.
Maximum 5 reference images per generation.
Duration options are 5 or 10 seconds only.
Ensure uploaded image and audio URLs are publicly accessible.
Seed value -1 generates a random seed each time.
If your result don’t have sound, please add prompt like “Add background sound”.

More Models to Try

vidu/reference-to-video-q2 - Vidu’s Q2 reference-to-video model.
google/veo3.1/reference-to-video - Google Veo 3.1 reference-conditioned video generator.
kwaivgi/kling-video-o1/reference-to-video - Kwaivgi’s Kling Video O1 reference-to-video model.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/alibaba/wan-2.6/reference-to-video-flash" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "size": "1280*720",
    "duration": 5,
    "shot_type": "single",
    "enable_audio": true,
    "enable_prompt_expansion": false,
    "seed": -1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
reference_urls	array	Yes	-	1 ~ 5 items	Array of URLs to reference images or videos. Images: 0-5, Videos: 0-3, Total: ≤5.
prompt	string	Yes		-	The positive prompt for the generation.
audio	string	No	-	-	Audio URL to guide generation (optional).
negative_prompt	string	No		-	The negative prompt for the generation.
size	string	No	1280*720	1280720, 7201280, 19201080, 10801920	The size of the generated media in pixels (width*height).
duration	integer	No	5	5, 10	The duration of the generated media in seconds.
shot_type	string	No	single	single, multi	The type of shots to generate.
enable_audio	boolean	No	true	-	Whether to generate audio for the video. Set to false to generate video without audio.
enable_prompt_expansion	boolean	No	false	-	If set to true, the prompt optimizer will be enabled.
seed	integer	No	-1	-1 ~ 2147483647	The random seed to use for the generation. -1 means a random seed will be used.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction, the ID of the prediction to get
data.model	string	Model ID used for the prediction
data.outputs	object	Array of URLs to the generated content (empty when status is not completed).
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Alibaba Wan 2.6 Reference To Video Alibaba Wan 2.6 Text To Image