How to Create a Digital Human
Generate talking head videos with AI-powered digital humans.
Overview
Digital human models combine:
- A face image or video
- Audio or text input
- Lip sync and facial animation
The result is a realistic video of a person speaking.
Quick Start
Web Interface
- Go to wavespeed.ai/models
- Select a digital human model (e.g., InfiniteTalk, MultiTalk)
- Upload a face image
- Upload audio or enter text
- Click Run
API
- Upload your image and audio files to get URLs
- Generate the digital human video:
curl --location --request POST 'https://api.wavespeed.ai/api/v3/wavespeed-ai/infinitetalk' \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"image": "https://your-face-image.jpg",
"audio": "https://your-audio.mp3",
"resolution": "480p",
"seed": -1
}'Recommended Models
| Model | Best For | Features |
|---|---|---|
| InfiniteTalk | General talking heads | Good lip sync |
| MultiTalk | Multiple angles | Flexible poses |
| Kling Motion Control | Precise control | Expression control |
Common Parameters
| Parameter | Description | Example |
|---|---|---|
image | Face image URL | ”https://…” |
audio | Audio to speak | ”https://…” |
resolution | Output resolution | ”480p”, “720p” |
seed | For reproducibility (-1 for random) | -1 |
Workflow Options
Option 1: Image + Audio
Upload a face image and pre-recorded audio:
curl --location --request POST 'https://api.wavespeed.ai/api/v3/wavespeed-ai/infinitetalk' \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"image": "https://your-face-image-url",
"audio": "https://your-audio-url",
"resolution": "480p",
"seed": -1
}'Option 2: Generate Audio First, Then Video
For text input, first generate speech using TTS, then use the audio URL:
- Generate speech with Minimax Speech
- Use the audio output URL in infinitetalk
Image Requirements
For best results:
| Requirement | Recommendation |
|---|---|
| Resolution | At least 512x512 |
| Face position | Centered, facing camera |
| Lighting | Even, no harsh shadows |
| Background | Simple, uncluttered |
| Expression | Neutral or slight smile |
Audio Requirements
| Requirement | Recommendation |
|---|---|
| Quality | Clear, minimal noise |
| Format | MP3, WAV, M4A |
| Length | Match desired video length |
Tips for Better Results
- Use high-quality inputs — Better images and audio = better output
- Match aspect ratios — Face image should match output video ratio
- Keep it simple — Complex expressions may not animate well
- Test short clips first — Iterate before generating long videos