Scaling AI Video Generation: How Novita AI Achieves Dual Optimization of Efficiency and Cost with WaveSpeedAI

Sun Jul 06 2025

Novita AI

WaveSpeedAI has significantly improved our inference efficiency and helped us cut video generation costs by up to 67%. With faster and more reliable video processing, we’re able to deliver an exceptional user experience at scale.”
— Junyu Huang, Novita AI COO

Customer Background

Novita AI is a company focused on AI inference infrastructure, dedicated to providing creators, developers, and enterprises with reliable and efficient video generation inference services. The company supports the deployment of multiple mainstream video generation models, covering end-to-end capabilities from image-to-video and text-to-video generation, serving global creative users and AI platforms at resolutions ranging from 720P to 1080P.

Novita AI

Challenges Before WaveSpeedAI

As the number of models and service complexity increased, Novita AI faced several challenges in its inference architecture and operations:

Complex resource scheduling due to multi-model deployment: Supporting multiple models such as Wan 2.1, Kling V1.6, and Hunyuan Video, each with different memory and computational requirements, resulted in significant differences in inference efficiency.
High costs for HD inference with underutilized GPUs: Especially for 720P and 1080P video generation tasks, individual inference cycles consumed large amounts of GPU memory, leading to high per-unit generation costs.
Unstable latency under high concurrency: Some large models experienced significant response delays during peak user traffic, negatively affecting end-user experience and platform reputation.

Collaboration with WaveSpeedAI

To address these challenges, Novita AI established a deep collaboration with WaveSpeed AI, focusing on the optimized deployment of the following core models:

Wan 2.1 Image-to-Video / Text-to-Video

Wan 2.1 Image-to-Video / Text-to-Video

Hunyuan Video Fast

MiniMax Video 01

MiniMax Video 01

Kling V1.6 Image-to-Video / Text-to-Video

Kling V1.6 Image-to-Video / Text-to-Video

With WaveSpeed AI’s support, Novita was able to fine-tune each model individually and dynamically schedule GPU resources across a unified pool, thereby maximizing both performance and cost efficiency.

Results & Benefits

✅ Inference Performance Optimization: Inference efficiency improved by up to 25%, with average video generation time reduced by 30–40%.

Model	Resolution	Pre-Optimization Time	Post-Optimization Time
Hunyuan Video Fast	720P	2 minutes	1 minute 30 seconds
Wan 2.1 Text-to-Video	1280×720	2 minutes 24 seconds	1 minute 55 seconds
Wan 2.1 Image-to-Video	1280×720	3 minutes 10 seconds	2 minutes 30 seconds
Kling V1.6 Image-to-Video	1080P / 5s	$0.98 / video	$0.92 / video

✅ Cost Structure Optimization: Average per-call cost reduced by over 30%, with up to 66% savings in high-resolution scenarios.

Model	Resolution	Pre-Optimization Cost	Post-Optimization Cost	Cost Reduction
Hunyuan Video Fast	720P	$0.18 / sec	$0.06 / sec	-66.7%
Wan 2.1 Text-to-Video	1280×720	$0.06 / sec	$0.04 / sec	-33.3%
Wan 2.1 Image-to-Video	1280×720	$0.08 / sec	$0.06 / sec	-25.0%
Kling V1.6 Image-to-Video	1080P / 5s	$0.49 / video	$0.46 / video	-6.1%

✅ Improved System Stability: Model responses are more stable under high concurrency, video generation success rates increased, and failure rates dropped to below 0.05%, significantly enhancing the user experience.

Looking Ahead

In the future, Novita AI will continue to deepen its collaboration with WaveSpeed AI to further enhance the flexibility and stability of multi-model deployment, explore more efficient video inference frameworks, and continuously optimize its cost structure. With WaveSpeedAI’s technical strengths, Novita AI is confident in its ability to deliver faster, more stable, and more cost-effective video generation services to global customers—pushing the boundaries of technology and business value in the field of AI media generation.

Try them now!

🔗Wan-2.1-14b-vace
🔗Hunyuan Video
🔗MiniMax Video 01
🔗Kling V1.6