Vidu Q2 — Reference-to-Video Model
Vidu Q2 is Shengshu Technology’s new-generation reference-to-video model designed to transform one or multiple input images into expressive, cinematic videos. It excels at producing subtle facial motion, natural body dynamics, and camera-aware storytelling with a strong sense of realism.
🎬 What It Does
Vidu Q2 synthesizes short videos from one or several reference images guided by a text prompt.
It’s ideal for turning still portraits or concept images into smooth motion clips — suitable for both creative storytelling and professional visual production.
✨ Key Features
- Smooth motion realism
Subtle micro-expressions, eye movements, and breathing motions are reproduced authentically.
- Cinematic camera dynamics
Built-in control of push/pull, pan, tilt, and zoom effects for scene depth and emotional tone.
- Multiple-image reference support
Upload up to 6 reference images to guide pose, lighting, or perspective transitions.
- Flexible composition
Choose from aspect ratios (16:9, 9:16, 4:3, 3:4, 1:1) for any platform.
- Motion amplitude control
Select auto / small / medium / large to define the strength and style of movement.
- High fidelity output
Consistent lighting, identity preservation, and accurate reference adherence even across complex motions.
🧩 Designed For
- Filmmakers & Storytellers: Bring still characters or concept art to life with controlled, cinematic motion.
- Advertising Creators: Generate short motion ads with precise control over composition and intensity.
- Artists & Illustrators: Animate hand-drawn or AI-generated portraits into dynamic living forms.
- Game & Animation Studios: Prototype visual narratives quickly using character or environment references.
⚙️ Parameters
| Parameter | Description |
|---|
| prompt | Describe the scene, action, or mood. |
| images | Upload up to 7 reference images. |
| aspect_ratio | Choose between 16:9, 9:16, 4:3, 3:4, 1:1. |
| resolution | 360p / 540p / 720p / 1080p. |
| movement_amplitude | auto / small / medium / large (defines motion intensity). |
| duration | Up to 8 seconds. |
| seed | Optional, for reproducible results. |
💰 Pricing
| Resolution | Duration | Pricing |
|---|
| 540p | 1s | $0.10 |
| 540p | 2s | $0.10 |
| 540p | 3s | $0.10 |
| 540p | 4s | $0.15 |
| 540p | 5s | $0.15 |
| 540p | 6s | $0.15 |
| 540p | 7s | $0.15 |
| 540p | 8s | $0.20 |
| 720p | 1s | $0.15 |
| 720p | 2s | $0.20 |
| 720p | 3s | $0.20 |
| 720p | 4s | $0.25 |
| 720p | 5s | $0.25 |
| 720p | 6s | $0.30 |
| 720p | 7s | $0.30 |
| 720p | 8s | $0.35 |
| 1080p | 1s | $0.30 |
| 1080p | 2s | $0.40 |
| 1080p | 3s | $0.50 |
| 1080p | 4s | $0.60 |
| 1080p | 5s | $0.70 |
| 1080p | 6s | $0.80 |
| 1080p | 7s | $0.90 |
| 1080p | 8s | $1.00 |
🧠 Tips for Best Results
- Use consistent lighting and angles among reference images for smoother transitions.
- Write prompts that define camera motion, emotion, or scene tone clearly.
- “auto” movement amplitude works best for portrait-style animation; use “medium” or “large” for full-body or action scenes.
- For cinematic looks, pair 16:9 with 1080p and descriptive atmosphere prompts (e.g., “soft sunlight flickering through leaves”).
📎 Note
- If you didn’t upload images locally, ensure the image URLs are publicly accessible. Successfully loaded images will display as thumbnails in the interface.