Multitalk

Playground

MultiTalk converts one image and audio into audio-driven talking/singing videos (Image-to-Video), supporting up to 10 minutes. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

MultiTalk

Generate realistic talking videos from a single photo with MultiTalk — MeiGen-AI’s revolutionary audio-driven conversational video framework. Unlike traditional talking head methods that only animate facial movements, MultiTalk creates lifelike videos with perfect lip synchronization, natural expressions, and dynamic body language.

Why It Looks Great

Perfect lip sync: Advanced audio analysis ensures precise mouth movements matching every syllable.
Full-body animation: Goes beyond faces — animates natural body movements and gestures.
Camera dynamics: Built-in Uni3C controlnet enables subtle camera movements for professional results.
Instruction following: Control scene, pose, and behavior through text prompts while maintaining sync.
Multi-person support: Animate conversations with multiple speakers in the same scene.
Extended duration: Generate videos up to 10 minutes long.

How It Works

MultiTalk combines three powerful technologies for optimal results:

Component	Function
Wav2Vec Audio Encoder	Analyzes speech nuances including rhythm, tone, and pronunciation patterns
Wan2.1 Video Diffusion	Understands human anatomy, facial expressions, and body movements
Uni3C Controlnet	Enables dynamic camera movements and professional scene control

Through sophisticated attention mechanisms, MultiTalk perfectly aligns lip movements with audio while maintaining natural facial expressions and body language.

Parameters

Parameter	Required	Description
image	Yes	Portrait image of the person to animate (upload or public URL).
audio	Yes	Audio file for lip synchronization (upload or public URL).

How to Use

Upload your image — a clear portrait photo works best.
Upload your audio — speech, singing, or any vocal audio.
Run — click the button to generate.
Download — preview and save your talking video.

Pricing

Per 5-second billing based on audio duration.

Duration	Cost
5 seconds	$0.15
30 seconds	$0.90
1 minute	$1.80
5 minutes	$9.00
10 minutes (max)	$18.00

Best Use Cases

Virtual Presenters — Create AI spokespeople for videos and training content.
Content Localization — Dub content into different languages with matching lip movements.
Music Videos — Generate singing performances from static photos.
E-learning — Produce instructor-led courses without filming.
Social Media — Create engaging talking-head content at scale.
Multi-person Conversations — Animate group discussions and dialogues.

Pro Tips for Best Results

Use clear, front-facing portrait photos with good lighting.
Ensure faces are clearly visible without obstructions.
High-quality audio with minimal background noise produces better sync.
Neutral or slightly open mouth expressions in source images work best.
For conversations, provide distinct audio tracks for each speaker.
Test with shorter clips before generating longer videos.

Wan2.1 T2V/I2V — Text-to-video and image-to-video generation
Uni3C Camera Control — Camera motion transfer for dynamic videos

Notes

Maximum supported video length is 10 minutes.
If using URLs, ensure they are publicly accessible.
Processing time scales with audio duration.
Best results come from portrait-style images with clear facial features.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/multitalk" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "seed": -1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
image	string	Yes		-	The image for generating the output.
audio	string	Yes	-	-	The audio for generating the output.
prompt	string	No		-	The positive prompt for the generation.
seed	integer	No	-1	-1 ~ 2147483647	The random seed to use for the generation. -1 means a random seed will be used.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction, the ID of the prediction to get
data.model	string	Model ID used for the prediction
data.outputs	string	Array of URLs to the generated content (empty when status is not completed).
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Moondream3 Preview Query Neta Lumina

Multitalk

Playground

Features

MultiTalk

Why It Looks Great

How It Works

Parameters

How to Use

Pricing

Best Use Cases

Pro Tips for Best Results

Related Workflows

Notes

Authentication

API Endpoints

Submit Task & Query Result

Parameters

Task Submission Parameters

Request Parameters

Response Parameters

Result Request Parameters

Result Response Parameters