Minimax Video-02

Hailuo 02 - MiniMax's next-generation AI video model with 2.5x efficiency improvement, 85% complex instruction response rate, and industry-leading cost-effectiveness for generating high-quality 6-second videos.

Features

Hailuo 02 - Next Generation AI Video Model

Hailuo 02 is MiniMax's revolutionary AI video generation model, representing a significant upgrade from Hailuo 01. Currently ranked #2 globally in both image-to-video and text-to-video benchmarks, surpassing Kuaishou's Kling and Google's Veo3, second only to ByteDance's recently released Seedance 1.0.

🚀 Model Highlights

Industry-Leading Performance

2.5x Efficiency Boost: Both training and inference efficiency improved by 250%
3x Model Parameters: Significantly enhanced model capacity
4x Training Data: Massive dataset expansion for superior quality
85% Complex Instruction Response Rate: Exceptional understanding of intricate prompts

Architectural Innovation

Hailuo 02 features a completely redesigned DiT (Diffusion Transformer) architecture, abandoning the previous framework for a more efficient and powerful system that delivers:

Enhanced temporal consistency
Superior motion dynamics
Exceptional physical realism

Cost-Effectiveness Champion

Lowest Price in Top Tier: Most affordable among leading video generation models
Unchanged Training Costs: Despite 2.5x efficiency gains
Premium Quality at Budget Price: Professional-grade results without premium pricing

🎯 Key Features

Resolution Options

768p: Standard quality for quick previews and drafts
1080p: Full HD for professional applications

Advanced Capabilities

Extreme Physics Simulation: Generates complex physical scenarios like acrobatics, fluid dynamics, and intricate movements
Cinematic Camera Control: Professional camera movements including panning, tilting, tracking, and complex trajectories
Multi-Style Support: From photorealistic to artistic, anime to documentary styles
Consistent Character Generation: Maintains character appearance throughout the video

💡 Application Scenarios

Film & Television Production

Rapidly generate complex VFX shots, including acrobatics, fantasy scenes, and challenging physical performances, dramatically reducing production costs and time.

Advertising & Creative

Provide brands with cost-effective, high-quality video content that meets diverse creative requirements while maintaining professional standards.

Content Creation

Empower creators and influencers to produce engaging video content efficiently, enhancing productivity without compromising quality.

Educational Entertainment

Generate educational animations, virtual performances, and engaging content that combines learning with entertainment value.

Corporate Communications

Offer SMEs affordable promotional videos that elevate brand image and market competitiveness without breaking the budget.

📊 Technical Specifications

Video Duration: 6 seconds (with plans for extended duration)
Frame Rate: 25 fps
Supported Formats: MP4, MOV
Input Types: Text prompts, reference images
Processing Time: Optimized for rapid generation

🔧 Usage Guidelines

Best Practices

Detailed Prompts: Leverage the 85% complex instruction response rate with comprehensive descriptions
High-Quality References: Use clear, high-resolution images for image-to-video generation
Style Consistency: Specify desired artistic style for coherent results
Physics Descriptions: Take advantage of advanced physics capabilities with specific motion descriptions

Limitations

Current maximum duration: 6 seconds
Output quality depends on input prompt/image quality
Designed for creative synthesis, not documentary accuracy

🛡️ Responsible Use

This model must not be used for:

Generating harmful, illegal, or deceptive content
Creating non-consensual or inappropriate material
Violating privacy or intellectual property rights
Spreading misinformation or propaganda
Any activity violating local or international laws

🌟 Why Choose Hailuo 02?

Performance Leader: #2 globally, surpassing established competitors
Cost Efficiency: Best price-performance ratio in the industry
Technical Excellence: 2.5x efficiency with 3x parameters
Versatility: Handles extreme complexity with ease
Future-Ready: Continuous improvements and feature expansions

Experience the next generation of AI video generation with Hailuo 02 - where cutting-edge technology meets practical affordability.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/minimax/video-02" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "prompt": "Circus Scene. The camera follows a clown riding unicycle while jugging balls. The camera pulls back, tracks left, and tilts loft",
    "resolution": "768p",
    "duration": 6,
    "enable_prompt_expansion": true
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
prompt	string	Yes	Circus Scene. The camera follows a clown riding unicycle while jugging balls. The camera pulls back, tracks left, and tilts loft	-	Generate a description of the video.(Note: Maximum support 2000 characters). 1. Support inserting mirror operation instructions to realize mirror operation control: mirror operation instructions need to be inserted into the lens application position in prompt in the format of [ ]. The standard mirror operation instruction format is [C1,C2,C3], where C represents different types of mirror operation. In order to ensure the effect of mirror operation, it is recommended to combine no more than 3 mirror operation instructions. 2. Support natural language description to realize mirror operation control; using the command internal mirror name will improve the accuracy of mirror operation response. 3. mirror operation instructions and natural language descriptions can be effective at the same time.
image	string	No	-	-	The model generates video with the picture passed in as the first frame.Base64 encoded strings in data:image/jpeg; base64,{data} format for incoming images, or URLs accessible via the public network. The uploaded image needs to meet the following conditions: Format is JPG/JPEG/PNG; The aspect ratio is greater than 2:5 and less than 5:2; Short side pixels greater than 300px; The image file size cannot exceed 20MB.
resolution	string	No	768p	-	Video resolution
duration	integer	No	6	6	Video duration in seconds
enable_prompt_expansion	boolean	No	true	-	The model automatically optimizes incoming prompts to improve build quality.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Query Parameters

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Minimax Video 01 Real ESRGAN