Multitalk

Multitalk

Playground

Try it on WavespeedAI!

MultiTalk is an audio-driven conversational AI video generation model. Create talking or singing videos from a single image and audio input. Our endpoint starts with $0.15 per 5 seconds video generation and supports a maximum generation length of 60 seconds.

Features

MultiTalk

What is MultiTalk?

MultiTalk is a revolutionary framework for audio-driven multi-person conversational video generation developed by MeiGen-AI. Unlike traditional talking head generation methods that only animate facial movements, MultiTalk technology can generate realistic videos of people speaking, singing, and interacting while maintaining perfect lip synchronization with audio input. MultiTalk transforms static photos into dynamic speaking videos by making the person speak or sing exactly what you want them to say.

Pricing

Our endpoint starts with $0.15 per 5 seconds video generation and supports a maximum generation length of 60 seconds.

How MultiTalk Works

MultiTalk leverages advanced AI technology to understand both audio signals and visual information. This MultiTalk implementation combines MultiTalk + Wan2.1 + Uni3C for optimal results.

Audio Analysis: MultiTalk uses a powerful audio encoder (Wav2Vec) to understand the nuances of speech, including rhythm, tone, and pronunciation patterns.

Visual Understanding: Built on the robust Wan2.1 video diffusion model (you can visit our Wan2.1 workflow for t2v/i2v eneration), MultiTalk understands human anatomy, facial expressions, and body movements.

Camera Control: MultiTalk with Uni3C controlnet enables subtle camera movements and scene control, making the video more dynamic and professional-looking. Check out our Uni3C workflow for creating beautiful camera motion transfer.

Perfect Synchronization: Through sophisticated attention mechanisms, MultiTalk learns to perfectly align lip movements with audio while maintaining natural facial expressions and body language.

Instruction Following: Unlike simpler methods, MultiTalk can follow text prompts to control the scene, pose, and overall behavior while maintaining audio synchronization.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/multitalk" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "prompt": "",
    "seed": -1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
imagestringYes-The image for generating the output.
audiostringYes--The audio for generating the output.
promptstringNo-The prompt for generating the output.
seedintegerNo-1-1 ~ 2147483647The random seed to use for the generation. -1 means a random seed will be used.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Query Parameters

Result Request Parameters

ParameterTypeRequiredDefaultDescription
idstringYes-Task ID

Result Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
dataobjectThe prediction data object containing all details
data.idstringUnique identifier for the prediction, the ID of the prediction to get
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds
© 2025 WaveSpeedAI. All rights reserved.