Vidu Reference To Video Q2
Playground
Try it on WavespeedAI!Vidu Q2 is a new image-to-video (and reference-to-video) model that emphasizes subtle facial expressions, smooth push–pull camera moves.
Features
Vidu Q2 — Reference-to-Video Model
Vidu Q2 is Shengshu Technology’s new-generation reference-to-video model designed to transform one or multiple input images into expressive, cinematic videos. It excels at producing subtle facial motion, natural body dynamics, and camera-aware storytelling with a strong sense of realism.
🎬 What It Does
Vidu Q2 synthesizes short videos from one or several reference images guided by a text prompt. It’s ideal for turning still portraits or concept images into smooth motion clips — suitable for both creative storytelling and professional visual production.
✨ Key Features
- Smooth motion realism Subtle micro-expressions, eye movements, and breathing motions are reproduced authentically.
- Cinematic camera dynamics Built-in control of push/pull, pan, tilt, and zoom effects for scene depth and emotional tone.
- Multiple-image reference support Upload up to 6 reference images to guide pose, lighting, or perspective transitions.
- Flexible composition Choose from aspect ratios (16:9, 9:16, 4:3, 3:4, 1:1) for any platform.
- Motion amplitude control Select auto / small / medium / large to define the strength and style of movement.
- High fidelity output Consistent lighting, identity preservation, and accurate reference adherence even across complex motions.
🧩 Designed For
- Filmmakers & Storytellers: Bring still characters or concept art to life with controlled, cinematic motion.
- Advertising Creators: Generate short motion ads with precise control over composition and intensity.
- Artists & Illustrators: Animate hand-drawn or AI-generated portraits into dynamic living forms.
- Game & Animation Studios: Prototype visual narratives quickly using character or environment references.
⚙️ Parameters
Parameter | Description |
---|---|
prompt | Describe the scene, action, or mood. |
images | Upload up to 7 reference images. |
aspect_ratio | Choose between 16:9, 9:16, 4:3, 3:4, 1:1. |
resolution | 360p / 540p / 720p / 1080p. |
movement_amplitude | auto / small / medium / large (defines motion intensity). |
duration | Up to 8 seconds. |
seed | Optional, for reproducible results. |
💰 Pricing
Resolution | Price per second |
---|---|
360p | $0.003 / s |
540p | $0.006 / s |
720p | $0.013 / s |
1080p | $0.030 / s |
🧠 Tips for Best Results
- Use consistent lighting and angles among reference images for smoother transitions.
- Write prompts that define camera motion, emotion, or scene tone clearly.
- “auto” movement amplitude works best for portrait-style animation; use “medium” or “large” for full-body or action scenes.
- For cinematic looks, pair 16:9 with 1080p and descriptive atmosphere prompts (e.g., “soft sunlight flickering through leaves”).
📎 Note
- If you didn’t upload images locally, ensure the image URLs are publicly accessible. Successfully loaded images will display as thumbnails in the interface.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/vidu/reference-to-video-q2" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"aspect_ratio": "16:9",
"resolution": "720p",
"duration": 5,
"movement_amplitude": "auto",
"seed": 0
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
Parameter | Type | Required | Default | Range | Description |
---|---|---|---|---|---|
prompt | string | Yes | - | The positive prompt for the generation. | |
images | array | Yes | [] | - | Reference images for video generation. Requirements: 1. Accept 1-7 images; 2. Images can be URLs or Base64 encoded |
aspect_ratio | string | No | 16:9 | 16:9, 9:16, 4:3, 3:4, 1:1 | The aspect ratio of the generated media. |
resolution | string | No | 720p | 720p, 360p, 540p, 1080p | The resolution of the generated media. |
duration | integer | No | 5 | 1, 2, 3, 4, 5, 6, 7, 8 | The duration of the generated media in seconds. |
movement_amplitude | string | No | auto | auto, small, medium, large | The movement amplitude of objects in the frame. Defaults to auto, accepted value: auto, small, medium, large. |
seed | integer | No | - | -1 ~ 2147483647 | The random seed to use for the generation. |
Response Parameters
Parameter | Type | Description |
---|---|---|
code | integer | HTTP status code (e.g., 200 for success) |
message | string | Status message (e.g., “success”) |
data.id | string | Unique identifier for the prediction, Task Id |
data.model | string | Model ID used for the prediction |
data.outputs | array | Array of URLs to the generated content (empty when status is not completed ) |
data.urls | object | Object containing related API endpoints |
data.urls.get | string | URL to retrieve the prediction result |
data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
data.status | string | Status of the task: created , processing , completed , or failed |
data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
data.error | string | Error message (empty if no error occurred) |
data.timings | object | Object containing timing details |
data.timings.inference | integer | Inference time in milliseconds |