Bytedance Avatar Omni Human 1.5
Playground
Try it on WavespeedAI!OmniHuman 1.5 converts audio and visual cues into lifelike avatar animations for virtual humans, storytelling, and interactive agents. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Features
bytedance/avatar-omni-human-1.5
ByteDance Avatar Omni-Human 1.5 is an advanced vision-audio fusion model designed to animate avatars through cognitive and emotional simulation. By combining image and audio inputs, it brings static portraits to life — generating natural facial expressions, synchronized lip movements, and realistic emotional responses.
🧠 Concept
Inspired by the paper “Instilling an Active Mind in Avatars via Cognitive Simulation”, the model simulates attention, emotion, and cognition to create avatars that don’t just move — they react intelligently.
🌟 Key Features
-
Audio-Driven Realism Generates precise lip-sync and emotional nuance directly from voice input.
-
Expressive Cognitive Simulation Models subtle eye movements, micro-expressions, and reactive behavior to emulate human presence.
-
Universal Avatar Adaptation Works with any static portrait or illustration to create consistent, lifelike performance.
-
Cross-Domain Support Handles both photorealistic and stylized avatars, adapting its realism to the visual style.
-
Flexible Output Encoding Choose between URL output or BASE64 encoding for seamless integration via API.
⚙️ Parameters
| Parameter | Description |
|---|---|
| image* | Upload a reference portrait or character image (JPG / PNG). |
| audio* | Upload or link to an audio file (WAV / MP3) for lip-sync and emotion mapping. |
💰 Pricing
| Metric | Price |
|---|---|
| Per second of generated audio | $0.25 / s |
💡 Use Cases
- Digital Avatars & VTubing — Drive realistic avatars from real voices in real time.
- Virtual Humans & NPCs — Give game or metaverse characters believable cognitive reactions.
- Marketing & Storytelling — Create expressive digital spokespeople or narrators.
- AI Companions & Education — Build avatars that engage naturally in learning or dialogue contexts.
📝 Notes
- The longer the audio, the higher the total cost (calculated per second).
- For best results, use clear, high-quality audio and well-lit frontal images.
- BASE64 output is API-only, useful for direct embedding into web applications.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/bytedance/avatar-omni-human-1.5" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"enable_base64_output": false
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
| image | string | Yes | - | The portrait image to animate, can be a URL or base64 encoded image. Better results with clear, front-facing portraits with good lighting. | |
| audio | string | Yes | - | - | Optional background audio for the generated video, can be a URL or base64 encoded audio file. |
| enable_base64_output | boolean | No | false | - | If enabled, the output will be encoded into a BASE64 string instead of a URL. This property is only available through the API. |
Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data.id | string | Unique identifier for the prediction, Task Id |
| data.model | string | Model ID used for the prediction |
| data.outputs | array | Array of URLs to the generated content (empty when status is not completed) |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |
Result Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| id | string | Yes | - | Task ID |
Result Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data | object | The prediction data object containing all details |
| data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
| data.model | string | Model ID used for the prediction |
| data.outputs | string | Array of URLs to the generated content (empty when status is not completed). |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |