
Nano Banana 2 is live
digital-human
Idle
Sua solicitação custará $0.16 por execução.
Por $10 você pode executar este modelo aproximadamente 62 vezes.
ByteDance Avatar Omni-Human 1.5 is an advanced vision-audio fusion model designed to animate avatars through cognitive and emotional simulation. By combining image and audio inputs, it brings static portraits to life — generating natural facial expressions, synchronized lip movements, and realistic emotional responses.
Inspired by the paper “Instilling an Active Mind in Avatars via Cognitive Simulation”, the model simulates attention, emotion, and cognition to create avatars that don’t just move — they react intelligently.
Audio-Driven Realism Generates precise lip-sync and emotional nuance directly from voice input.
Expressive Cognitive Simulation Models subtle eye movements, micro-expressions, and reactive behavior to emulate human presence.
Universal Avatar Adaptation Works with any static portrait or illustration to create consistent, lifelike performance.
Cross-Domain Support Handles both photorealistic and stylized avatars, adapting its realism to the visual style.
Flexible Output Encoding Choose between URL output or BASE64 encoding for seamless integration via API.
| Parameter | Description |
|---|---|
| image* | Upload a reference portrait or character image (JPG / PNG). |
| audio* | Upload or link to an audio file (WAV / MP3) for lip-sync and emotion mapping. |
| Metric | Price |
|---|---|
| Per second of generated audio | $0.25 / s |