Latentsync

Playground

LatentSync synchronizes video and audio inputs to generate seamless synchronized content. Perfect for lip-syncing, audio dubbing, and video-audio alignment tasks.

Features

LatentSync — Audio-to-Video Lip Sync

LatentSync is a state-of-the-art end-to-end lip-sync framework built on audio-conditioned latent diffusion. It turns your talking-head videos into perfectly synchronized performances while preserving high-resolution details and natural expressions.

🌟 Key Capabilities

End-to-End Lip Synchronization

Transform any talking-head clip into a lip-synced video:

Takes a source video plus target audio as input
Generates frame-accurate mouth movements without 3D meshes or 2D landmarks
Preserves identity, pose, background and global scene structure

High-Resolution Talking Heads

Built on latent diffusion to deliver:

Sharp, detailed faces at high resolution
Natural facial expressions and subtle mouth shapes
Works for both real and stylized (e.g., anime) characters from the reference video

Temporal Consistency

LatentSync introduces Temporal REPresentation Alignment (TREPA) to:

Reduce flicker, jitter and frame-to-frame artifacts
Keep head pose, lips and jaw motion stable over long sequences
Maintain smooth, coherent motion at video frame rates

Multilingual & Robust

Designed for real-world content:

Supports multiple languages and accents
Robust to different speakers and recording conditions
Handles a variety of video styles and camera setups

🎬 Core Features

Audio-Conditioned Latent Diffusion — Directly models audio–visual correlations in the latent space for efficient, high-quality generations.
TREPA Temporal Alignment — Uses temporal representations to enforce consistency across frames.
Improved Lip-Sync Supervision — Refined training strategies for better lip–audio alignment on standard benchmarks.
Resolution Flexibility — Supports HD talking-head synthesis with controllable output resolution and frame rate.
Open-Source Ecosystem — Public code, checkpoints and simple CLI/GUI tools for quick integration into your pipeline.

🚀 How to Use

Prepare Source Video
Provide a clear talking-head clip (.mp4) of the identity you want to animate. Please at least upload a video with resolution higher than 480p. Higher resolutions (720p, 1080p and 4k) are recommended.
- Face should be visible and mostly unobstructed
- Stable framing (minimal extreme motion) works best
Provide Target Audio
Upload the speech you want the subject to say (e.g., .wav, .mp3).
- Use clean audio with minimal background noise
- Trim leading/trailing silence if possible
Run Inference
The system will generate a lip-synced talking-head video aligned with your audio.

💰 Pricing

Minimum price: $0.15,

If the audio is less than 5 seconds. The minimum price will be $0.15
And the price will adapted based on the duration of input audio

💡 Pro Tips

Use high-quality, well-lit source videos with a clear view of the mouth.
Keep audio clean and dry — avoid heavy music, echo, and strong background noise.
For long speeches, consider segmenting audio into shorter chunks to improve stability and resource usage.
Match the frame rate of the output video to your target platform (e.g., 24/25/30 FPS).
If you encounter artifacts, try:
- Slightly lowering resolution
- Increasing sampling steps
- Choosing a video segment where the head is more stable

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/latentsync" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
audio	string	Yes	-	-	The URL of the audio to be synchronized.
video	string	Yes		-	The URL of the video to be synchronized.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction, the ID of the prediction to get
data.model	string	Model ID used for the prediction
data.outputs	string	Array of URLs to the generated content (empty when status is not completed).
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Kandinsky5 Pro Text To Video Longcat Avatar