SkyReels V1: Human-Centric Video Foundation Model
š Overview
SkyReels V1 is the first and most advanced open-source human-centric video foundation model. By fine-tuning <a href="https://huggingface.co/tencent/HunyuanVideo">HunyuanVideo</a> on O(10M) high-quality film and television clips, SkyReels V1 offers three key advantages:
- Open-Source Leadership: Our Text-to-Video model achieves state-of-the-art (SOTA) performance among open-source models, comparable to proprietary models like Kling and Hailuo.
- Advanced Facial Animation: Captures 33 distinct facial expressions with over 400 natural movement combinations, accurately reflecting human emotions.
- Cinematic Lighting and Aesthetics: Trained on high-quality Hollywood-level film and television data, each generated frame exhibits cinematic quality in composition, actor positioning, and camera angles.
š Key Features
1. Self-Developed Data Cleaning and Annotation Pipeline
Our model is built on a self-developed data cleaning and annotation pipeline, creating a vast dataset of high-quality film, television, and documentary content.
- Expression Classification: Categorizes human facial expressions into 33 distinct types.
- Character Spatial Awareness: Utilizes 3D human reconstruction technology to understand spatial relationships between multiple people in a video, enabling film-level character positioning.
- Action Recognition: Constructs over 400 action semantic units to achieve a precise understanding of human actions.
- Scene Understanding: Conducts cross-modal correlation analysis of clothing, scenes, and plots.