Wan 2.1 Multitalk

Wan 2.1 Multitalk

Playground

Try it on WavespeedAI!

MultiTalk (WAN 2.1) is an audio-driven AI that turns a single image and audio into talking or singing conversational videos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

MultiTalk

Transform static photos into dynamic speaking videos with MultiTalk — a revolutionary audio-driven video generation framework by MeiGen-AI. Unlike traditional talking head methods, MultiTalk animates full conversations with realistic lip synchronization, natural body movements, and even multi-person interactions.

Why It Looks Great

  • Perfect lip sync: Advanced audio encoding (Wav2Vec) captures speech nuances including rhythm, tone, and pronunciation for precise synchronization.
  • Multi-person support: Generate videos with multiple speakers interacting naturally in the same scene.
  • Full body animation: Goes beyond facial movements to include natural gestures, expressions, and body language.
  • Dynamic camera control: Powered by Uni3C controlnet for subtle camera movements and professional cinematography.
  • Prompt-guided generation: Follow text instructions to control scene, pose, and behavior while maintaining audio sync.
  • Extended duration: Support for videos up to 10 minutes long.

How It Works

MultiTalk combines three powerful technologies for optimal results:

ComponentFunction
MultiTalk CoreAudio-to-motion synthesis with perfect lip synchronization
Wan2.1Video diffusion model for realistic human anatomy, expressions, and movements
Uni3CCamera controlnet for dynamic, professional-looking scene control

How to Use

  1. Upload your image — provide a photo with one or more people.
  2. Upload your audio — add the speech or song you want the subject to perform.
  3. Write your prompt (optional) — describe the scene, pose, or behavior you want.
  4. Set duration — choose your desired video length.
  5. Run — click the button to generate.
  6. Download — preview and save your talking video.

Pricing

Per 5-second billing based on audio duration. Maximum video length: 10 minutes.

MetricCost
Per 5 seconds$0.15

Billing Rules

  • Minimum charge: 5 seconds ($0.15)
  • Maximum duration: 600 seconds (10 minutes)
  • Billed duration: Audio length rounded up to nearest 5-second increment
  • Total cost: (Billed duration ÷ 5) × $0.15

Examples

Audio LengthBilled DurationCalculationTotal Cost
3s5s (minimum)5 ÷ 5 × $0.15$0.15
12s15s15 ÷ 5 × $0.15$0.45
30s30s30 ÷ 5 × $0.15$0.90
1m (60s)60s60 ÷ 5 × $0.15$1.80
5m (300s)300s300 ÷ 5 × $0.15$9.00
10m (600s)600s (maximum)600 ÷ 5 × $0.15$18.00

Best Use Cases

  • Virtual Presentations — Create professional talking head videos from a single photo.
  • Content Localization — Dub videos into different languages with perfect lip sync.
  • Music & Performance — Generate singing videos with synchronized mouth movements.
  • Conversational Content — Produce multi-person dialogue scenes for storytelling.
  • Marketing & Advertising — Create spokesperson videos without filming sessions.

Pro Tips for Best Results

  • Use clear, front-facing photos with visible faces for the best lip synchronization.
  • High-quality audio with minimal background noise produces more accurate results.
  • For multi-person scenes, ensure all faces are clearly visible in the source image.
  • Add scene descriptions in your prompt to enhance the visual context and atmosphere.
  • Start with shorter clips to test synchronization before generating longer videos.

Notes

  • If using URLs, ensure they are publicly accessible.
  • Processing time scales with video duration and complexity.
  • Best results come from clear speech audio and well-lit portrait images.
  • For singing content, ensure the audio has clear vocal tracks.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/wan-2.1/multitalk" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "seed": -1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
imagestringYes-The image for generating the output.
audiostringYes--The audio for generating the output.
promptstringNo-The positive prompt for the generation.
seedintegerNo-1-1 ~ 2147483647The random seed to use for the generation. -1 means a random seed will be used.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

ParameterTypeRequiredDefaultDescription
idstringYes-Task ID

Result Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
dataobjectThe prediction data object containing all details
data.idstringUnique identifier for the prediction, the ID of the prediction to get
data.modelstringModel ID used for the prediction
data.outputsstringArray of URLs to the generated content (empty when status is not completed).
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds
© 2025 WaveSpeedAI. All rights reserved.