Sync Lipsync 2

Playground

Sync Lipsync-2 synchronizes lip movements in any video to supplied audio, enabling realistic mouth alignment for films, podcasts, games, or animations. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

Lipsync 2.0 is a zero-shot model for generating realistic lip movements that match spoken audio. It works out of the box—no training or fine-tuning needed—and preserves a speaker’s unique style across different languages and video types. Whether you’re working with live-action footage, animation, or AI-generated characters, Lipsync 2.0 brings new levels of realism, control, and speed.

What it does Zero-shot: No waiting around for training. Just drop in your video and audio—Lipsync 2.0 handles the rest.

Style preservation: The model picks up on how someone speaks by watching them speak. Even when translating across languages, it keeps their signature delivery.

Cross-domain support: Works with live-action humans, animated characters, and AI-generated faces.

Flexible workflows: Use it for dubbing, editing words in post, or reanimating entire performances.

Key features Temperature control: Fine-tune how expressive the lipsync is. Make it subtle or dial it up depending on the scene.

Active speaker detection: Automatically detects who’s speaking in multi-person videos and applies lipsync only when that person is talking.

Flawless animation: Handles everything from stylized 3D characters to hyperreal AI avatars. Not just for translation—this unlocks editable dialogue in post-production.

Record once, edit forever: You don’t need multiple takes. Change dialogue after the fact while keeping the original speaker’s delivery intact.

Dub any video with AI: If you can generate a video with text, you can dub it too. No need to capture everything on camera anymore.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/sync/lipsync-2" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "sync_mode": "cut_off"
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
video	string	Yes		-	The video to be used for generation
audio	string	Yes	-	-	The audio to be used for generation
sync_mode	string	No	cut_off	bounce, loop, cut_off, silence, remap	Defines how to handle duration mismatches between video and audio inputs. See the Media Content Tips guide https://docs.sync.so/compatibility-and-tips/media-content-tips#sync-mode-options for a brief overview, or the SyncMode enum below for detailed explanations of each option.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Sync Lipsync 1.9.0 Beta Sync Lipsync 2 Pro