How to Create a Digital Human

How to Create a Digital Human

Generate talking head videos with AI-powered digital humans.

Overview

Digital human models combine:

  • A face image or video
  • Audio or text input
  • Lip sync and facial animation

The result is a realistic video of a person speaking.

Quick Start

Web Interface

  1. Go to wavespeed.ai/models
  2. Select a digital human model (e.g., InfiniteTalk, MultiTalk)
  3. Upload a face image
  4. Upload audio or enter text
  5. Click Run

API

  1. Upload your image and audio files to get URLs
  2. Generate the digital human video:
curl --location --request POST 'https://api.wavespeed.ai/api/v3/wavespeed-ai/infinitetalk' \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
  "image": "https://your-face-image.jpg",
  "audio": "https://your-audio.mp3",
  "resolution": "480p",
  "seed": -1
}'
ModelBest ForFeatures
InfiniteTalkGeneral talking headsGood lip sync
MultiTalkMultiple anglesFlexible poses
Kling Motion ControlPrecise controlExpression control

Common Parameters

ParameterDescriptionExample
imageFace image URL”https://…”
audioAudio to speak”https://…”
resolutionOutput resolution”480p”, “720p”
seedFor reproducibility (-1 for random)-1

Workflow Options

Option 1: Image + Audio

Upload a face image and pre-recorded audio:

curl --location --request POST 'https://api.wavespeed.ai/api/v3/wavespeed-ai/infinitetalk' \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
  "image": "https://your-face-image-url",
  "audio": "https://your-audio-url",
  "resolution": "480p",
  "seed": -1
}'

Option 2: Generate Audio First, Then Video

For text input, first generate speech using TTS, then use the audio URL:

  1. Generate speech with Minimax Speech
  2. Use the audio output URL in infinitetalk

Image Requirements

For best results:

RequirementRecommendation
ResolutionAt least 512x512
Face positionCentered, facing camera
LightingEven, no harsh shadows
BackgroundSimple, uncluttered
ExpressionNeutral or slight smile

Audio Requirements

RequirementRecommendation
QualityClear, minimal noise
FormatMP3, WAV, M4A
LengthMatch desired video length

Tips for Better Results

  1. Use high-quality inputs — Better images and audio = better output
  2. Match aspect ratios — Face image should match output video ratio
  3. Keep it simple — Complex expressions may not animate well
  4. Test short clips first — Iterate before generating long videos
© 2025 WaveSpeedAI. All rights reserved.