WaveSpeed AI Latentsync | AI Digital Human API

LatentSync — Audio-to-Video Lip Sync

LatentSync is a state-of-the-art end-to-end lip-sync framework built on audio-conditioned latent diffusion. It turns your talking-head videos into perfectly synchronized performances while preserving high-resolution details and natural expressions.

🌟 Key Capabilities

End-to-End Lip Synchronization

Transform any talking-head clip into a lip-synced video:

Takes a source video plus target audio as input
Generates frame-accurate mouth movements without 3D meshes or 2D landmarks
Preserves identity, pose, background and global scene structure

High-Resolution Talking Heads

Built on latent diffusion to deliver:

Sharp, detailed faces at high resolution
Natural facial expressions and subtle mouth shapes
Works for both real and stylized (e.g., anime) characters from the reference video

Temporal Consistency

LatentSync introduces Temporal REPresentation Alignment (TREPA) to:

Reduce flicker, jitter and frame-to-frame artifacts
Keep head pose, lips and jaw motion stable over long sequences
Maintain smooth, coherent motion at video frame rates

Multilingual & Robust

Designed for real-world content:

Supports multiple languages and accents
Robust to different speakers and recording conditions
Handles a variety of video styles and camera setups

🎬 Core Features

Audio-Conditioned Latent Diffusion — Directly models audio–visual correlations in the latent space for efficient, high-quality generations.
TREPA Temporal Alignment — Uses temporal representations to enforce consistency across frames.
Improved Lip-Sync Supervision — Refined training strategies for better lip–audio alignment on standard benchmarks.
Resolution Flexibility — Supports HD talking-head synthesis with controllable output resolution and frame rate.
Open-Source Ecosystem — Public code, checkpoints and simple CLI/GUI tools for quick integration into your pipeline.

🚀 How to Use

Prepare Source Video Provide a clear talking-head clip (.mp4) of the identity you want to animate. Please at least upload a video with resolution higher than 480p. Higher resolutions (720p, 1080p and 4k) are recommended.

Face should be visible and mostly unobstructed
Stable framing (minimal extreme motion) works best

Provide Target Audio Upload the speech you want the subject to say (e.g., .wav, .mp3).

Use clean audio with minimal background noise
Trim leading/trailing silence if possible

Run Inference The system will generate a lip-synced talking-head video aligned with your audio.

💰 Pricing

Minimum price: $0.15,

If the audio is less than 5 seconds. The minimum price will be $0.15
And the price will adapted based on the duration of input audio

💡 Pro Tips

Use high-quality, well-lit source videos with a clear view of the mouth.
Keep audio clean and dry — avoid heavy music, echo, and strong background noise.
For long speeches, consider segmenting audio into shorter chunks to improve stability and resource usage.
Match the frame rate of the output video to your target platform (e.g., 24/25/30 FPS).
If you encounter artifacts, try:

Slightly lowering resolution
Increasing sampling steps
Choosing a video segment where the head is more stable

Latentsync API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/wavespeed-ai/latentsync with your input as JSON. The endpoint returns a prediction id. Start polling the result endpoint around every 2 seconds, increase the interval for long-running tasks, and stop on any terminal status. On completed, read output values from data.outputs. Examples for Latentsync below.

HTTP example

set -euo pipefail

: "${WAVESPEED_API_KEY:?Set WAVESPEED_API_KEY}"

REQUEST_BODY=$(cat <<'JSON'
{
    "audio": "https://interactive-examples.mdn.mozilla.net/media/cc0-audio/t-rex-roar.mp3",
    "video": "https://interactive-examples.mdn.mozilla.net/media/cc0-videos/flower.mp4"
}
JSON
)

# 1. Submit the prediction.
SUBMIT_RESPONSE=$(curl --silent --show-error --fail-with-body \
  -X POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/latentsync" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d "$REQUEST_BODY")

TASK=$(printf '%s' "$SUBMIT_RESPONSE" | jq 'if has("data") then .data else . end')
PREDICTION_ID=$(printf '%s' "$TASK" | jq -r '.id')
if [ -z "$PREDICTION_ID" ] || [ "$PREDICTION_ID" = "null" ]; then
  printf 'Submission response did not contain a prediction id
' >&2
  exit 1
fi
RESULT_URL=$(printf '%s' "$TASK" | jq -r '.urls.get // empty')
if [ -z "$RESULT_URL" ]; then
  RESULT_URL="https://api.wavespeed.ai/api/v3/predictions/$PREDICTION_ID/result"
fi

# 2. Poll until the prediction finishes.
while true; do
  RESPONSE=$(curl --silent --show-error --fail-with-body "$RESULT_URL" \
    -H "Authorization: Bearer $WAVESPEED_API_KEY")
  RESULT=$(printf '%s' "$RESPONSE" | jq 'if has("data") then .data else . end')
  STATUS=$(printf '%s' "$RESULT" | jq -r '.status')
  case "$STATUS" in
    completed) printf '%s\n' "$RESULT" | jq '.outputs'; break ;;
    failed|cancelled|timeout) printf '%s\n' "$RESULT" | jq . >&2; exit 1 ;;
    created|processing) sleep 2 ;;
    *) printf 'Unexpected status: %s
' "$STATUS" >&2; exit 1 ;;
  esac
done

Node.js example

const submitUrl = "https://api.wavespeed.ai/api/v3/wavespeed-ai/latentsync";
const apiKey = process.env.WAVESPEED_API_KEY;
if (!apiKey) throw new Error('Set WAVESPEED_API_KEY');

async function requestJson(url, options = {}) {
  const response = await fetch(url, options);
  if (!response.ok) throw new Error(await response.text());
  return response.json();
}

// 1. Submit the prediction.
const body = await requestJson(submitUrl, {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${apiKey}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
        "audio": "https://interactive-examples.mdn.mozilla.net/media/cc0-audio/t-rex-roar.mp3",
        "video": "https://interactive-examples.mdn.mozilla.net/media/cc0-videos/flower.mp4"
}),
});
const task = body.data ?? body;
if (!task.id) throw new Error("Submission response did not contain a prediction id");
const resultUrl = task.urls?.get ||
  `https://api.wavespeed.ai/api/v3/predictions/${task.id}/result`;

// 2. Poll until the prediction finishes.
while (true) {
  const resultBody = await requestJson(resultUrl, {
    headers: { "Authorization": `Bearer ${apiKey}` },
  });
  const result = resultBody.data ?? resultBody;
  if (result.status === "completed") {
    console.log(result.outputs);
    break;
  }
  if (["failed", "cancelled", "timeout"].includes(result.status)) throw new Error(JSON.stringify(result));
  if (!["created", "processing"].includes(result.status)) throw new Error("Unexpected status: " + result.status);
  await new Promise(resolve => setTimeout(resolve, 2000));
}

Python example

import json
import os
import time
from urllib.request import Request, urlopen

api_key = os.environ["WAVESPEED_API_KEY"]
headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}
payload = {
    "audio": "https://interactive-examples.mdn.mozilla.net/media/cc0-audio/t-rex-roar.mp3",
    "video": "https://interactive-examples.mdn.mozilla.net/media/cc0-videos/flower.mp4"
}

def request_json(url, data=None):
    request = Request(url, data=data, headers=headers, method="POST" if data else "GET")
    with urlopen(request) as response:
        return json.load(response)

# 1. Submit the prediction.
body = request_json("https://api.wavespeed.ai/api/v3/wavespeed-ai/latentsync", json.dumps(payload).encode())
task = body.get("data", body)
if not task.get("id"):
    raise RuntimeError("Submission response did not contain a prediction id")
result_url = task.get("urls", {}).get("get") or f"https://api.wavespeed.ai/api/v3/predictions/{task['id']}/result"

# 2. Poll until the prediction finishes.
while True:
    result_body = request_json(result_url)
    result = result_body.get("data", result_body)
    status = result.get("status")
    if status == "completed":
        print(result.get("outputs", []))
        break
    if status in {"failed", "cancelled", "timeout"}:
        raise RuntimeError(result)
    if status not in {"created", "processing"}:
        raise RuntimeError(f"Unexpected status: {status}")
    time.sleep(2)

Latentsync API — Frequently asked questions

What is the Latentsync API?

Latentsync is a WaveSpeedAI model for talking-avatar generation, exposed as a REST API on WaveSpeedAI. LatentSync synchronizes video and audio inputs to generate seamless synchronized content. Perfect for lip-syncing, audio dubbing, and video-audio alignment tasks. You can call it programmatically or try it from the playground above.

How do I call the Latentsync API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID. Poll the result endpoint starting around every 2 seconds, increase the interval for long-running tasks, and stop on any terminal status. The playground generates production-oriented Python, JavaScript, and cURL examples with timeouts, transient-error handling, and safe GET retries. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/wavespeed-ai/latentsync.

How much does Latentsync cost per run?

Latentsync starts at $0.050 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Latentsync accept?

Key inputs: `video`, `audio`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/wavespeed-ai/latentsync.

How long does Latentsync take to generate?

Median end-to-end generation time on WaveSpeedAI is around 115 seconds per request, based on recent successful runs. Queue time varies with global demand; live status is visible in the prediction record.

Can I use Latentsync outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (WaveSpeedAI). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.

ÖrneklerTümünü görüntüle

İlgili Modeller

README