WaveSpeedAI Desktop is Available Now!Try it
Explorer/Vidu Models

Vidu Models

vidu/text-to-video-q2

vidu

$0.8

text-to-video-q2

vidu/start-end-to-video-q2-turbo

vidu

$0.1

start-end-to-video-q2-turbo

vidu/image-to-video-q2-turbo

vidu

$0.1

image-to-video-q2-turbo

vidu/image-to-video-q2-pro

vidu

$0.15

image-to-video-q2-pro

vidu/template/halloween

vidu

$0.05

template/halloween

vidu/reference-to-video-q2

vidu

$0.25

reference-to-video-q2

vidu/start-end-to-video-q1

vidu

$0.4

start-end-to-video-q1

vidu/image-to-video-2.0

vidu

$0.3

image-to-video-2.0

vidu/text-to-video-q1

vidu

$0.4

text-to-video-q1

vidu/text-to-video-2.0

vidu

$0.3

text-to-video-2.0

vidu/text-to-video

vidu

$0.2

text-to-video

vidu/start-end-to-video-q2-pro

vidu

$0.15

start-end-to-video-q2-pro

vidu/image-to-video

vidu

$0.2

image-to-video

vidu/image-to-video-q1

vidu

$0.4

image-to-video-q1

vidu/start-end-to-video

vidu

$0.2

start-end-to-video

vidu/reference-to-video-q1

vidu

$0.4

reference-to-video-q1

vidu/start-end-to-video-2.0

vidu

$0.3

start-end-to-video-2.0

vidu/reference-to-video-2.0

vidu

$0.2

reference-to-video-2.0

vidu/text-to-image-q2
vidu/text-to-image-q2

vidu

$0.03

text-to-image-q2

vidu/reference-to-image-q2
vidu/reference-to-image-q2

vidu

$0.04

reference-to-image-q2

Now supports 10s generation!

Vidu is Shengshu Technology’s advanced video generation suite, combining the Vidu 2.0 and Q-series models. Built on open-source diffusion backbones and trained on large-scale, high-quality datasets, Vidu delivers strong performance across a wide range of video creation tasks. Its models offer precise control, consistent visual quality, and robust temporal stability, making Vidu suitable for professional, production-grade workflows.

Image-to-Video Models

• vidu/image-to-video-q2-turbo

A high-speed image-to-video model for complex scenes and multi-character shots. Delivers smooth, coherent motion and solid structure preservation while enabling near real-time preview and refinement.

• vidu/image-to-video-q2-pro

A professional-grade image-to-video model offering sharper detail, more stable character identity, and refined cinematic motion. Suited for polished production assets, hero shots, and client-facing deliverables.

image-to-video-2.0

Transforms a single image into a smooth, coherent video while preserving structure, composition, and layout. Provides strong temporal stability and natural camera motion for professional post-production and editing pipelines.

image-to-video-q1

A premium image-to-video model with enhanced texture detail and superior portrait handling. Maintains lighting and identity consistency while generating cinematic motion and expressive character performance.

image-to-video

A lightweight, fast I2V model for rapid drafts, ideation, and social media content. Balances speed and structural preservation, producing clean clips with minimal artifacts.

Text-to-Video Models

vidu/text-to-video-q2

A flagship text-to-video model with stronger temporal coherence, richer scene detail, and more precise camera and motion control. Designed for complex, multi-character narratives and high-end commercial storytelling.

text-to-video-2.0

Generates videos directly from text prompts with reliable prompt adherence, coherent multi-object scenes, and controllable camera motion. Well suited for high-quality conceptual and narrative video generation.

text-to-video-q1

A high-fidelity T2V model offering richer color, sharper detail, and stronger narrative continuity. Ideal for cinematic storytelling, branding, and visually polished marketing assets.

text-to-video

A baseline T2V option optimized for efficiency and turnaround speed. Designed for ads, explainers, and straightforward text-driven concepts where fast iteration is key.

Reference-to-Video Models

reference-to-video-2.0

Creates videos guided by a reference image, ensuring accurate character likeness, stable style control, and consistent wardrobe and appearance across frames.

reference-to-video-q1

An upgraded reference-based generator with sharper details and more faithful style and identity transfer. Reduces drift and artifacts, especially in close-ups and longer shots.

reference-to-video-q2

Supports multiple distinct objects or characters interacting within a single video, enabling complex, reference-guided scene compositions.

Start-End Frame Video Models

start-end-to-video-2.0

Synthesizes smooth motion between user-defined start and end frames while respecting overall scene geometry and layout. Ideal for transitions, reveals, and structured motion design.

start-end-to-video-q1

Enhances narrative continuity and motion smoothness, producing more natural easing between poses, camera positions, and scene states.

start-end-to-video-q2-pro

A professional-grade model focused on reinforced temporal coherence and precise motion control. Generates stable intermediate frames while closely aligning with user-specified start and end constraints.

start-end-to-video-q2-turbo

A high-speed variant optimized for rapid iteration and preview. Preserves core coherence and subject integrity while significantly reducing generation latency.

start-end-to-video

A compact baseline model for simple start–end interpolation and quick previews. Suitable for basic transitions, animatics, and fast storyboard development.

Image Models

• vidu/text-to-image-q2  – High-resolution cinematic text-to-image model for generating polished hero shots, thumbnails, and key visuals directly from prompts.

• vidu/reference-to-image-q2 – Reference-guided image generator that uses up to seven input images plus a prompt to create new, high-res shots that preserve subject identity and composition.