
Scaling AI Video Generation: How Novita AI Achieves Dual Optimization of Efficiency and Cost with WaveSpeedAI
WaveSpeedAI has significantly improved our inference efficiency and helped us cut video generation costs by up to 67%. With faster and more reliable video processing, we're able to deliver an exceptional user experience at scale.
— Junyu Huang, Novita AI COO
Customer Background
Novita AI is a company focused on AI inference infrastructure, dedicated to providing creators, developers, and enterprises with reliable and efficient video generation inference services. The company supports the deployment of multiple mainstream video generation models, covering end-to-end capabilities from image-to-video and text-to-video generation, serving global creative users and AI platforms at resolutions ranging from 720P to 1080P.
Challenges Before WaveSpeedAI
As the number of models and service complexity increased, Novita AI faced several challenges in its inference architecture and operations:
Complex Resource Scheduling Due to Multi-Model Deployment
Supporting multiple models such as Wan 2.1, Kling V1.6, and Hunyuan Video, each with different memory and computational requirements, resulted in significant differences in inference efficiency.
High Costs for HD Inference with Underutilized GPUs
Especially for 720P and 1080P video generation tasks, individual inference cycles consumed large amounts of GPU memory, leading to high per-unit generation costs.
Unstable Latency Under High Concurrency
Some large models experienced significant response delays during peak user traffic, negatively affecting end-user experience and platform reputation.
Collaboration with WaveSpeedAI
To address these challenges, Novita AI established a deep collaboration with WaveSpeedAI, focusing on the optimized deployment of the following core models: