Now supports 10s generation!
Vidu is Shengshu Technology’s advanced video generation suite, combining the Vidu 2.0 and Q-series models. Built on open-source diffusion backbones and trained on large-scale, high-quality datasets, Vidu delivers strong performance across a wide range of video creation tasks. Its models offer precise control, consistent visual quality, and robust temporal stability, making Vidu suitable for professional, production-grade workflows.
Image-to-Video Models
• vidu/image-to-video-q2-turbo
A high-speed image-to-video model for complex scenes and multi-character shots. Delivers smooth, coherent motion and solid structure preservation while enabling near real-time preview and refinement.
• vidu/image-to-video-q2-pro
A professional-grade image-to-video model offering sharper detail, more stable character identity, and refined cinematic motion. Suited for polished production assets, hero shots, and client-facing deliverables.
• image-to-video-2.0
Transforms a single image into a smooth, coherent video while preserving structure, composition, and layout. Provides strong temporal stability and natural camera motion for professional post-production and editing pipelines.
• image-to-video-q1
A premium image-to-video model with enhanced texture detail and superior portrait handling. Maintains lighting and identity consistency while generating cinematic motion and expressive character performance.
• image-to-video
A lightweight, fast I2V model for rapid drafts, ideation, and social media content. Balances speed and structural preservation, producing clean clips with minimal artifacts.
Text-to-Video Models
• vidu/text-to-video-q2
A flagship text-to-video model with stronger temporal coherence, richer scene detail, and more precise camera and motion control. Designed for complex, multi-character narratives and high-end commercial storytelling.
• text-to-video-2.0
Generates videos directly from text prompts with reliable prompt adherence, coherent multi-object scenes, and controllable camera motion. Well suited for high-quality conceptual and narrative video generation.
• text-to-video-q1
A high-fidelity T2V model offering richer color, sharper detail, and stronger narrative continuity. Ideal for cinematic storytelling, branding, and visually polished marketing assets.
• text-to-video
A baseline T2V option optimized for efficiency and turnaround speed. Designed for ads, explainers, and straightforward text-driven concepts where fast iteration is key.
Reference-to-Video Models
• reference-to-video-2.0
Creates videos guided by a reference image, ensuring accurate character likeness, stable style control, and consistent wardrobe and appearance across frames.
• reference-to-video-q1
An upgraded reference-based generator with sharper details and more faithful style and identity transfer. Reduces drift and artifacts, especially in close-ups and longer shots.
• reference-to-video-q2
Supports multiple distinct objects or characters interacting within a single video, enabling complex, reference-guided scene compositions.
Start-End Frame Video Models
• start-end-to-video-2.0
Synthesizes smooth motion between user-defined start and end frames while respecting overall scene geometry and layout. Ideal for transitions, reveals, and structured motion design.
• start-end-to-video-q1
Enhances narrative continuity and motion smoothness, producing more natural easing between poses, camera positions, and scene states.
• start-end-to-video-q2-pro
A professional-grade model focused on reinforced temporal coherence and precise motion control. Generates stable intermediate frames while closely aligning with user-specified start and end constraints.
• start-end-to-video-q2-turbo
A high-speed variant optimized for rapid iteration and preview. Preserves core coherence and subject integrity while significantly reducing generation latency.
• start-end-to-video
A compact baseline model for simple start–end interpolation and quick previews. Suitable for basic transitions, animatics, and fast storyboard development.
Image Models
• vidu/text-to-image-q2 – High-resolution cinematic text-to-image model for generating polished hero shots, thumbnails, and key visuals directly from prompts.
• vidu/reference-to-image-q2 – Reference-guided image generator that uses up to seven input images plus a prompt to create new, high-res shots that preserve subject identity and composition.