Vidu is Shengshu Technology's comprehensive video generation suite, featuring both Vidu 2.0 and Q-series models. Built on open-source diffusion technology and trained on large-scale, high-quality datasets with human-aligned tuning, Vidu excels in multiple video creation scenarios.
The collection includes specialized models for image-to-video transformation, reference-based video generation, and start-end frame video synthesis. Each model offers precise control and consistent quality, making it ideal for professional creative workflows.
Available Models:
- image-to-video-2.0 — Turns a single image into a smooth video while preserving subject structure and scene layout. Strong temporal stability and natural camera motion make it reliable for professional edits.
- image-to-video-q1 — Premium I2V with finer texture detail and portrait fidelity. Excels at keeping identity and lighting consistent while adding cinematic motion.
- image-to-video — Lightweight I2V for quick drafts and social clips. Fast turnaround with solid subject preservation and minimal artifacts.
- text-to-video-2.0 — Generates videos directly from text prompts with robust prompt adherence and coherent multi-object scenes. Good temporal consistency and controllable camera moves.
- text-to-video-q1 — Higher-fidelity T2V with richer color, cleaner edges, and stronger narrative continuity. Suited for cinematic storytelling and brand visuals.
- text-to-video — Baseline T2V that balances speed and quality for straightforward concepts, ads, and explainers.
- reference-to-video-2.0 — Produces videos guided by a reference image for style/ID control. Maintains character likeness and wardrobe across frames with stable motion.
- reference-to-video-q1 — Enhanced reference-guided generation with sharper details and more faithful style transfer. Reduces drift and artifacts in close-ups.
- start-end-to-video-2.0 — Synthesizes motion between given start and end frames while respecting scene geometry. Ideal for transitions, reveals, and layout-aware moves.
- start-end-to-video-q1 — Upgraded start-end synthesis with stronger narrative continuity and smoother easing between poses and camera positions.
- start-end-to-video-q2-pro — Pro-tier start-end model focused on temporal coherence and complex motion. Delivers accurate alignment to start/end constraints with stable intermediate frames.
- start-end-to-video-q2-turbo — Latency-optimized variant for rapid iteration. Keeps core coherence and subject integrity while prioritizing speed.
- start-end-to-video — Compact baseline for simple start-end interpolation and quick motion previews. Suitable for lightweight transitions and story beats.