Bytedance Avatar Omni Human
Playground
Try it on WavespeedAI!Transform your portrait photos into dynamic avatar videos with OmniHuman technology. Create realistic human motion and expressions from a single portrait image. 0.12$ per second.
Features
OmniHuman
OmniHuman is a cutting-edge end-to-end AI framework developed by ByteDance, designed to generate highly realistic human videos from just a single image and an audio input, with advanced features such as lip sync, facial animation, and gesture synthesis. Whether you provide a portrait, half-body, or full-body photo, OmniHuman brings it to life with natural movements, expressive gestures, accurate lip synchronization to audio, and remarkable attention to detail. By combining multiple input types—such as images and audio—OmniHuman creates vivid, high-quality video results. The model is highly adaptable, supporting not only real human portraits but also animated or cartoon characters, making it suitable for a wide range of applications including content creation, singing, lip sync videos, and performance scenarios. 0.12$ per second.
OmniHuman Avatar Effect
Requirements
Number of Images
- Only one image can be uploaded per generation.
Image Requirements
- Only human portrait images are supported.
- For best results, use clear, front-facing portraits with good lighting.
- Supported formats: PNG, JPEG, JPG, WebP.
- Maximum file size: 50MB.
Output Characteristics
- Produces natural human motion, facial expressions, and accurate lip sync to audio.
- Works best with clear, well-lit portrait photos.
- May not perform optimally with extreme poses or poor lighting.
Best Practices
- Use a clear, front-facing portrait photo.
- Ensure the image is well-lit.
- Avoid extreme angles or poses.
- Make sure the face is clearly visible.
- Avoid images with multiple people.
Keywords
- lip sync
- facial animation
- gesture synthesis
- portrait animation
- audio-driven video generation
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/bytedance/avatar-omni-human" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"image": "https://portal.volccdn.com/obj/volcfe/cloud-universal-doc/upload_e85bf7e9ab70752ac2730926434b181a.jpeg",
"audio": "https://replicate.delivery/xezq/kQDIDSPSMfTGPqfPaDezJW3I4WK0f6FhXt17pSzrKhR5nmsTB/output.mp3",
"enable_base64_output": false
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
Parameter | Type | Required | Default | Range | Description |
---|---|---|---|---|---|
image | string | Yes | - | The portrait image to animate, can be a URL or base64 encoded image. Better results with clear, front-facing portraits with good lighting. | |
audio | string | Yes | https://replicate.delivery/xezq/kQDIDSPSMfTGPqfPaDezJW3I4WK0f6FhXt17pSzrKhR5nmsTB/output.mp3 | - | Optional background audio for the generated video, can be a URL or base64 encoded audio file. |
enable_base64_output | boolean | No | false | - | If enabled, the output will be encoded into a BASE64 string instead of a URL. This property is only available through the API. |
Response Parameters
Parameter | Type | Description |
---|---|---|
code | integer | HTTP status code (e.g., 200 for success) |
message | string | Status message (e.g., “success”) |
data.id | string | Unique identifier for the prediction, Task Id |
data.model | string | Model ID used for the prediction |
data.outputs | array | Array of URLs to the generated content (empty when status is not completed ) |
data.urls | object | Object containing related API endpoints |
data.urls.get | string | URL to retrieve the prediction result |
data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
data.status | string | Status of the task: created , processing , completed , or failed |
data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
data.error | string | Error message (empty if no error occurred) |
data.timings | object | Object containing timing details |
data.timings.inference | integer | Inference time in milliseconds |
Result Query Parameters
Result Request Parameters
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
id | string | Yes | - | Task ID |
Result Response Parameters
Parameter | Type | Description |
---|---|---|
code | integer | HTTP status code (e.g., 200 for success) |
message | string | Status message (e.g., “success”) |
data | object | The prediction data object containing all details |
data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
data.model | string | Model ID used for the prediction |
data.outputs | array | Array of URLs to the generated content (empty when status is not completed ) |
data.urls | object | Object containing related API endpoints |
data.urls.get | string | URL to retrieve the prediction result |
data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
data.status | string | Status of the task: created , processing , completed , or failed |
data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
data.error | string | Error message (empty if no error occurred) |
data.timings | object | Object containing timing details |
data.timings.inference | integer | Inference time in milliseconds |