Hunyuan Avatar

Hunyuan Avatar

Playground

Try it on WavespeedAI!

Hunyuan Avatar is an audio-driven conversational AI video generation model. Create talking or singing videos from a single image and audio input. Our endpoint starts with $0.15 per 5 seconds video generation (480p/720p) and supports a maximum generation length of 120 seconds.

Features

Hunyuan Avatar - High-Fidelity Audio-Driven Human Animation

Transform audio and images into high-quality AI avatar videos with Hunyuan Avatar, an advanced audio-driven human animation model designed for creating dynamic, emotion-controllable, and multi-character dialogue videos.

Overview HunyuanAvatar is a High-Fidelity Audio-Driven Human Animation model for Multiple Characters. The model excels at generating highly dynamic videos while preserving character consistency, achieving precise emotion alignment between characters and audio, and enabling multi-character audio-driven animation through innovative multimodal diffusion transformer (MM-DiT) architecture.

Key Capabilities Create production-ready avatar videos with:

  • Character Consistency Preservation Generate dynamic videos while maintaining strong character consistency Character image injection module eliminates condition mismatch between training and inference Fine-tune facial characteristics across different poses and expressions

  • Audio-Driven Animation High-fidelity audio-driven human animation capabilities Audio Emotion Module (AEM) extracts and transfers emotional cues from reference images Face-Aware Audio Adapter (FAA) enables independent audio injection for multi-character scenarios

  • Multi-Character Support Generate multi-character dialogue videos from single inputs Independent audio injection via cross-attention for multiple characters Realistic avatars in dynamic, immersive scenarios

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/hunyuan-avatar" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "resolution": "480p"
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
audiostringYes--The audio for generating the output.
imagestringYes-The image for generating the output.
promptstringNo-The prompt for generating the output.
resolutionstringNo480p480p, 720pThe resolution of the output video.
seedintegerNo--1 ~ 2147483647The random seed to use for the generation. -1 means a random seed will be used.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

© 2025 WaveSpeedAI. All rights reserved.