How to Create a Digital Human

Generate talking head videos with AI-powered digital humans.

Not sure which model to use? Try our Avatar Generator — we’ve curated the best digital human models so you can start creating right away.

Overview

Digital human models combine:

A face image or video
Audio or text input
Lip sync and facial animation

The result is a realistic video of a person speaking.

Quick Start

Web Interface

Go to wavespeed.ai/models
Select a digital human model (e.g., InfiniteTalk, MultiTalk)
Upload a face image
Upload audio or enter text
Click Run

API

Upload your image and audio files to get URLs
Generate the digital human video:

curl --fail-with-body --connect-timeout 10 --max-time 60 --request POST 'https://api.wavespeed.ai/api/v3/wavespeed-ai/infinitetalk' \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
  "image": "https://your-face-image.jpg",
  "audio": "https://your-audio.mp3",
  "resolution": "480p",
  "seed": -1
}'

Recommended Models

Model	Best For	Features
InfiniteTalk	General talking heads	Good lip sync
MultiTalk	Multiple angles	Flexible poses
Kling Motion Control	Precise control	Expression control

Common Parameters

Parameter	Description	Example
`image`	Face image URL	”https://…”
`audio`	Audio to speak	”https://…”
`resolution`	Output resolution	”480p”, “720p”
`seed`	For reproducibility (-1 for random)	-1

Workflow Options

Option 1: Image + Audio

Upload a face image and pre-recorded audio:

curl --fail-with-body --connect-timeout 10 --max-time 60 --request POST 'https://api.wavespeed.ai/api/v3/wavespeed-ai/infinitetalk' \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
  "image": "https://your-face-image-url",
  "audio": "https://your-audio-url",
  "resolution": "480p",
  "seed": -1
}'

Option 2: Generate Audio First, Then Video

For text input, first generate speech using TTS, then use the audio URL:

Generate speech with Minimax Speech
Use the audio output URL in infinitetalk

Image Requirements

For best results:

Requirement	Recommendation
Resolution	At least 512x512
Face position	Centered, facing camera
Lighting	Even, no harsh shadows
Background	Simple, uncluttered
Expression	Neutral or slight smile

Audio Requirements

Requirement	Recommendation
Quality	Clear, minimal noise
Format	MP3, WAV, M4A
Length	Match desired video length

Tips for Better Results

Use high-quality inputs — Better images and audio = better output
Match aspect ratios — Face image should match output video ratio
Keep it simple — Complex expressions may not animate well
Test short clips first — Iterate before generating long videos

How to Generate Audio Complete Workflow Tutorial