Home/Explore/Vidu Video Models/vidu/reference-to-video-q2

image-to-video

vidu/reference-to-video-q2

Vidu Q2 is a new image-to-video (and reference-to-video) model that emphasizes subtle facial expressions, smooth push–pull camera moves.

preview
preview

Idle

Your request will cost $0.25 per run.

For $10 you can run this model approximately 40 times.

One more thing:

ExamplesView all

README

Vidu Q2 — Reference-to-Video Model

Vidu Q2 is Shengshu Technology’s new-generation reference-to-video model designed to transform one or multiple input images into expressive, cinematic videos. It excels at producing subtle facial motion, natural body dynamics, and camera-aware storytelling with a strong sense of realism.

🎬 What It Does

Vidu Q2 synthesizes short videos from one or several reference images guided by a text prompt. It’s ideal for turning still portraits or concept images into smooth motion clips — suitable for both creative storytelling and professional visual production.

✨ Key Features

  • Smooth motion realism Subtle micro-expressions, eye movements, and breathing motions are reproduced authentically.
  • Cinematic camera dynamics Built-in control of push/pull, pan, tilt, and zoom effects for scene depth and emotional tone.
  • Multiple-image reference support Upload up to 6 reference images to guide pose, lighting, or perspective transitions.
  • Flexible composition Choose from aspect ratios (16:9, 9:16, 4:3, 3:4, 1:1) for any platform.
  • Motion amplitude control Select auto / small / medium / large to define the strength and style of movement.
  • High fidelity output Consistent lighting, identity preservation, and accurate reference adherence even across complex motions.

🧩 Designed For

  • Filmmakers & Storytellers: Bring still characters or concept art to life with controlled, cinematic motion.
  • Advertising Creators: Generate short motion ads with precise control over composition and intensity.
  • Artists & Illustrators: Animate hand-drawn or AI-generated portraits into dynamic living forms.
  • Game & Animation Studios: Prototype visual narratives quickly using character or environment references.

⚙️ Parameters

ParameterDescription
promptDescribe the scene, action, or mood.
imagesUpload up to 7 reference images.
aspect_ratioChoose between 16:9, 9:16, 4:3, 3:4, 1:1.
resolution360p / 540p / 720p / 1080p.
movement_amplitudeauto / small / medium / large (defines motion intensity).
durationUp to 8 seconds.
seedOptional, for reproducible results.

💰 Pricing

ResolutionPrice per second
360p$0.003 / s
540p$0.006 / s
720p$0.013 / s
1080p$0.030 / s

🧠 Tips for Best Results

  • Use consistent lighting and angles among reference images for smoother transitions.
  • Write prompts that define camera motion, emotion, or scene tone clearly.
  • “auto” movement amplitude works best for portrait-style animation; use “medium” or “large” for full-body or action scenes.
  • For cinematic looks, pair 16:9 with 1080p and descriptive atmosphere prompts (e.g., “soft sunlight flickering through leaves”).

📎 Note

  • If you didn’t upload images locally, ensure the image URLs are publicly accessible. Successfully loaded images will display as thumbnails in the interface.