Bytedance Avatar Omni Human

Playground

Bytedance OmniHuman turns a single portrait photo into avatar video with lifelike motion and expressions ($0.12/sec). Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

OmniHuman

OmniHuman is a cutting-edge end-to-end AI framework developed by ByteDance, designed to generate highly realistic human videos from just a single image and an audio input, with advanced features such as lip sync, facial animation, and gesture synthesis. Whether you provide a portrait, half-body, or full-body photo, OmniHuman brings it to life with natural movements, expressive gestures, accurate lip synchronization to audio, and remarkable attention to detail. By combining multiple input types—such as images and audio—OmniHuman creates vivid, high-quality video results. The model is highly adaptable, supporting not only real human portraits but also animated or cartoon characters, making it suitable for a wide range of applications including content creation, singing, lip sync videos, and performance scenarios. 0.12$ per second.

OmniHuman Avatar Effect

Requirements

Number of Images

Only one image can be uploaded per generation.

Image Requirements

Only human portrait images are supported.
For best results, use clear, front-facing portraits with good lighting.
Supported formats: PNG, JPEG, JPG, WebP.
Maximum file size: 50MB.

Output Characteristics

Produces natural human motion, facial expressions, and accurate lip sync to audio.
Works best with clear, well-lit portrait photos.
May not perform optimally with extreme poses or poor lighting.

Best Practices

Use a clear, front-facing portrait photo.
Ensure the image is well-lit.
Avoid extreme angles or poses.
Make sure the face is clearly visible.
Avoid images with multiple people.

Keywords

lip sync
facial animation
gesture synthesis
portrait animation
audio-driven video generation

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/bytedance/avatar-omni-human" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "enable_base64_output": false
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
image	string	Yes		-	The portrait image to animate, can be a URL or base64 encoded image. Better results with clear, front-facing portraits with good lighting.
audio	string	Yes	-	-	Optional background audio for the generated video, can be a URL or base64 encoded audio file.
enable_base64_output	boolean	No	false	-	If enabled, the output will be encoded into a BASE64 string instead of a URL. This property is only available through the API.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Kwaivgi Kling Video To Audio Bytedance Avatar Omni Human 1.5