SkyReels V1: Human-Centric Video Foundation Model

🌟 Overview

SkyReels V1 is the first and most advanced open-source human-centric video foundation model. By fine-tuning <a href="https://huggingface.co/tencent/HunyuanVideo">HunyuanVideo</a> on O(10M) high-quality film and television clips, SkyReels V1 offers three key advantages:

Open-Source Leadership: Our Text-to-Video model achieves state-of-the-art (SOTA) performance among open-source models, comparable to proprietary models like Kling and Hailuo.
Advanced Facial Animation: Captures 33 distinct facial expressions with over 400 natural movement combinations, accurately reflecting human emotions.
Cinematic Lighting and Aesthetics: Trained on high-quality Hollywood-level film and television data, each generated frame exhibits cinematic quality in composition, actor positioning, and camera angles.

🔑 Key Features

1. Self-Developed Data Cleaning and Annotation Pipeline

Our model is built on a self-developed data cleaning and annotation pipeline, creating a vast dataset of high-quality film, television, and documentary content.

Expression Classification: Categorizes human facial expressions into 33 distinct types.
Character Spatial Awareness: Utilizes 3D human reconstruction technology to understand spatial relationships between multiple people in a video, enabling film-level character positioning.
Action Recognition: Constructs over 400 action semantic units to achieve a precise understanding of human actions.
Scene Understanding: Conducts cross-modal correlation analysis of clothing, scenes, and plots.

wavespeed-ai/SkyReels-V1

SkyReels V1 is the first and most advanced open-source human-centric video foundation model. By fine-tuning HunyuanVideo on O(10M) high-quality film and television clips.

ExamplesView all

README

SkyReels V1: Human-Centric Video Foundation Model

🌟 Overview

🔑 Key Features

1. Self-Developed Data Cleaning and Annotation Pipeline