Vidu Reference To Video Q2
Playground
Try it on WavespeedAI!Vidu Q2 is an Image-to-Video and Reference-to-Video model that emphasizes subtle facial expressions and smooth push-pull camera moves for natural motion. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Features
Vidu Q2 Reference-to-Video
Vidu Q2 Reference-to-Video transforms one or multiple input images into expressive, cinematic videos. It excels at producing subtle facial motion, natural body dynamics, and camera-aware storytelling — ideal for turning still portraits or concept images into smooth motion clips.
Why Choose This?
-
Smooth motion realism Subtle micro-expressions, eye movements, and breathing motions reproduced authentically.
-
Cinematic camera dynamics Built-in control of push/pull, pan, tilt, and zoom effects for scene depth and emotional tone.
-
Multiple-image reference support Upload up to 7 reference images to guide pose, lighting, or perspective transitions.
-
Flexible composition Choose from multiple aspect ratios (16:9, 9:16, 4:3, 3:4, 1:1) for any platform.
-
Motion amplitude control Select auto, small, medium, or large to define the strength and style of movement.
-
High fidelity output Consistent lighting, identity preservation, and accurate reference adherence.
Parameters
| Parameter | Required | Description |
|---|---|---|
| prompt | Yes | Describe the scene, action, or mood |
| images | Yes | Reference images (up to 7 images) |
| aspect_ratio | No | Aspect ratio: 16:9, 9:16, 4:3, 3:4, or 1:1 |
| resolution | No | Output resolution: 540p, 720p, or 1080p |
| duration | No | Video length in seconds (1–10) |
| movement_amplitude | No | Motion intensity: auto, small, medium, or large |
| seed | No | Random seed for reproducibility (-1 for random) |
How to Use
- Upload reference images — add up to 7 images to guide the generation.
- Write your prompt — describe the scene, action, camera motion, or mood.
- Choose aspect ratio — select based on your target platform.
- Set resolution — 540p, 720p, or 1080p based on quality needs.
- Set duration — choose video length from 1 to 10 seconds.
- Adjust movement amplitude — auto for portraits, medium/large for action.
- Run — submit and download your video.
Pricing
| Resolution | Duration | Price |
|---|---|---|
| 540p | 1s | $0.075 |
| 540p | 2s | $0.10 |
| 540p | 3s | $0.125 |
| 540p | 4s | $0.15 |
| 540p | 5s | $0.175 |
| 540p | 6s | $0.20 |
| 540p | 7s | $0.225 |
| 540p | 8s | $0.25 |
| 540p | 9s | $0.35 |
| 540p | 10s | $0.45 |
| 720p | 1s | $0.125 |
| 720p | 2s | $0.15 |
| 720p | 3s | $0.175 |
| 720p | 4s | $0.20 |
| 720p | 5s | $0.225 |
| 720p | 6s | $0.25 |
| 720p | 7s | $0.275 |
| 720p | 8s | $0.30 |
| 720p | 9s | $0.40 |
| 720p | 10s | $0.50 |
| 1080p | 1s | $0.375 |
| 1080p | 2s | $0.425 |
| 1080p | 3s | $0.475 |
| 1080p | 4s | $0.525 |
| 1080p | 5s | $0.575 |
| 1080p | 6s | $0.625 |
| 1080p | 7s | $0.675 |
| 1080p | 8s | $0.725 |
| 1080p | 9s | $0.825 |
| 1080p | 10s | $0.925 |
Billing Rules
540p: $0.075 for 1s, +$0.025/s up to 8s, then $0.35 for 9s, $0.45 for 10s
720p: $0.125 for 1s, +$0.025/s up to 8s, then $0.40 for 9s, $0.50 for 10s
1080p: $0.375 for 1s, +$0.05/s up to 8s, then $0.825 for 9s, $0.925 for 10s
Best Use Cases
- Filmmakers and Storytellers — Bring still characters or concept art to life with controlled, cinematic motion.
- Advertising Creators — Generate short motion ads with precise control over composition and intensity.
- Artists and Illustrators — Animate hand-drawn or AI-generated portraits into dynamic living forms.
- Game and Animation Studios — Prototype visual narratives quickly using character or environment references.
Pro Tips
- Use consistent lighting and angles among reference images for smoother transitions.
- Write prompts that define camera motion, emotion, or scene tone clearly.
- “auto” movement amplitude works best for portrait-style animation.
- Use “medium” or “large” amplitude for full-body or action scenes.
- For cinematic looks, pair 16:9 with 1080p and descriptive atmosphere prompts.
Notes
- Maximum 7 reference images per generation.
- Maximum duration is 10 seconds.
- If using image URLs, ensure they are publicly accessible.
- Successfully loaded images will display as thumbnails in the interface.
Related Models
- Vidu Q2 Text-to-Video — Generate videos from text prompts only.
- Vidu Q2 Pro Image-to-Video — High-quality single image to video.
- Vidu Q2 Turbo Image-to-Video — Fast single image to video.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/vidu/reference-to-video-q2" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"aspect_ratio": "16:9",
"resolution": "720p",
"duration": 5,
"movement_amplitude": "auto",
"seed": 0
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
| prompt | string | Yes | - | The positive prompt for the generation. | |
| images | array | Yes | [] | 1 ~ 7 items | Reference images for video generation. Requirements: 1. Accept 1-7 images; 2. Images can be URLs or Base64 encoded |
| aspect_ratio | string | No | 16:9 | 16:9, 9:16, 4:3, 3:4, 1:1 | The aspect ratio of the generated media. |
| resolution | string | No | 720p | 540p, 720p, 1080p | The resolution of the generated media. |
| duration | number | No | 5 | 1 ~ 10 | The duration of the generated media in seconds. |
| movement_amplitude | string | No | auto | auto, small, medium, large | The movement amplitude of objects in the frame. Defaults to auto, accepted value: auto, small, medium, large. |
| seed | integer | No | - | -1 ~ 2147483647 | The random seed to use for the generation. |
Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data.id | string | Unique identifier for the prediction, Task Id |
| data.model | string | Model ID used for the prediction |
| data.outputs | array | Array of URLs to the generated content (empty when status is not completed) |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |
Result Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| id | string | Yes | - | Task ID |
Result Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data | object | The prediction data object containing all details |
| data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
| data.model | string | Model ID used for the prediction |
| data.outputs | string | Array of URLs to the generated content (empty when status is not completed). |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |