Explore/Vidu Video Models

Vidu Video Models

vidu/reference-to-video-q1

vidu

$0.4

reference-to-video-q1

vidu/reference-to-video-2.0

vidu

$0.2

reference-to-video-2.0

vidu/start-end-to-video-2.0

vidu

$0.3

start-end-to-video-2.0

vidu/image-to-video

vidu

$0.2

image-to-video

vidu/start-end-to-video

vidu

$0.2

start-end-to-video

vidu/start-end-to-video-q1

vidu

$0.4

start-end-to-video-q1

vidu/image-to-video-2.0

vidu

$0.3

image-to-video-2.0

vidu/image-to-video-q1

vidu

$0.4

image-to-video-q1

vidu/text-to-video-q1

vidu

$0.4

text-to-video-q1

vidu/text-to-video-2.0

vidu

$0.3

text-to-video-2.0

vidu/text-to-video

vidu

$0.2

text-to-video

vidu/start-end-to-video-q2-pro

vidu

$0.15

start-end-to-video-q2-pro

vidu/start-end-to-video-q2-turbo

vidu

$0.1

start-end-to-video-q2-turbo

Vidu is Shengshu Technology's comprehensive video generation suite, featuring both Vidu 2.0 and Q-series models. Built on open-source diffusion technology and trained on large-scale, high-quality datasets with human-aligned tuning, Vidu excels in multiple video creation scenarios.

The collection includes specialized models for image-to-video transformation, reference-based video generation, and start-end frame video synthesis. Each model offers precise control and consistent quality, making it ideal for professional creative workflows.

Available Models:

  • image-to-video-2.0 — Turns a single image into a smooth video while preserving subject structure and scene layout. Strong temporal stability and natural camera motion make it reliable for professional edits.
  • image-to-video-q1 — Premium I2V with finer texture detail and portrait fidelity. Excels at keeping identity and lighting consistent while adding cinematic motion.
  • image-to-video — Lightweight I2V for quick drafts and social clips. Fast turnaround with solid subject preservation and minimal artifacts.
  • text-to-video-2.0 — Generates videos directly from text prompts with robust prompt adherence and coherent multi-object scenes. Good temporal consistency and controllable camera moves.
  • text-to-video-q1 — Higher-fidelity T2V with richer color, cleaner edges, and stronger narrative continuity. Suited for cinematic storytelling and brand visuals.
  • text-to-video — Baseline T2V that balances speed and quality for straightforward concepts, ads, and explainers.
  • reference-to-video-2.0 — Produces videos guided by a reference image for style/ID control. Maintains character likeness and wardrobe across frames with stable motion.
  • reference-to-video-q1 — Enhanced reference-guided generation with sharper details and more faithful style transfer. Reduces drift and artifacts in close-ups.
  • start-end-to-video-2.0 — Synthesizes motion between given start and end frames while respecting scene geometry. Ideal for transitions, reveals, and layout-aware moves.
  • start-end-to-video-q1 — Upgraded start-end synthesis with stronger narrative continuity and smoother easing between poses and camera positions.
  • start-end-to-video-q2-pro — Pro-tier start-end model focused on temporal coherence and complex motion. Delivers accurate alignment to start/end constraints with stable intermediate frames.
  • start-end-to-video-q2-turbo — Latency-optimized variant for rapid iteration. Keeps core coherence and subject integrity while prioritizing speed.
  • start-end-to-video — Compact baseline for simple start-end interpolation and quick motion previews. Suitable for lightweight transitions and story beats.