OmniHuman-1.5:Toward Virtual Humans with “Soul”

WaveSpeedAI,Thu Sep 04 2025

Have you ever watched videos featuring smoothly animated digital humans, but felt they lacked genuine emotion? To overcome this limitation, we introduce OmniHuman-1.5, developed by ByteDance—a groundbreaking framework designed to generate character animations that transcend superficial mimicry. It not only brings virtual avatars to life but also endows them with the ability to express emotions.

From Imitation to Expression: A Technical Breakthrough

OmniHuman-1.5 employs a dual-system simulation framework.

First, this method leverages multimodal large models to generate structured semantic representations, providing advanced semantic guidance that enables motion generation to transcend mere rhythm synchronization and better align with context and emotion.

Second, through a specially designed multimodal DiT architecture and pseudo-end-frame mechanism, it efficiently fuses multimodal information while mitigating conflicts, thereby generating actions that are deeply consistent with characters, scenes, and language. OmniHuman-1.5

What Can OmniHuman-1.5 Do?

🎶Musical Performances

Using just a photo and a song, OmniHuman-1.5 can create a “digital singer” that precisely mimics the artist’s pauses, breaths, and rhythm.

🎭Emotional Acting

OmniHuman-1.5 can not only create digital singers but also produce emotional digital actors.

🗣️Context-Aware Gestures

Instead of repetitive gestures, animations align with meaning. For instance, when the audio mentions “heart,” the character naturally places a hand on her chest.

✍️Text-Guided Animation

OmniHuman-1.5 supports prompt control. Examples include:

camera movements: “The camera slowly circles the character for an arthouse mood.”
object generation: “The avatar reaches toward the lens, then begins speaking.”
specific actions: “A penguin dances, wears sunglasses, and performs on stage.”

👥Multi-Character and Stylized Scenarios

Unlike previous digital humans, OmniHuman-1.5 can engage in group conversations and perform ensemble acts.

It also works across humans, animals, anthropomorphic figures, and stylized cartoons, showing remarkable versatility.

Conclusion: Toward Virtual Humans with “Soul”

Virtual human technology has achieved a new breakthrough. The emergence of OmniHuman-1.5 signifies a new era where virtual humans have evolved from superficial mimicry to deep expression. It can understand what you say and engage in genuine, heartfelt communication with you. Let’s look forward to the launch of the OmniHuman-1.5 model!

Discord: Discord
X(Twitter): Twitter
Open Source Projects: Open Source Projects