Vidu Q3 ve Q3 Pro modellerinde %50 indirim · Yalnızca WaveSpeedAI | 20 Mayıs – 2 Haziran

WaveSpeed AI Latentsync

wavespeed-ai /

LatentSync synchronizes video and audio inputs to generate seamless synchronized content. Perfect for lip-syncing, audio dubbing, and video-audio alignment tasks.

digital-human
Giriş

Sürükleyip bırakın veya yüklemek için tıklayın

Sürükleyip bırakın veya yüklemek için tıklayın

Boşta

$0.05çalıştırma başına·~20 / $1

ÖrneklerTümünü görüntüle

İlgili Modeller

README

LatentSync — Audio-to-Video Lip Sync

LatentSync is a state-of-the-art end-to-end lip-sync framework built on audio-conditioned latent diffusion. It turns your talking-head videos into perfectly synchronized performances while preserving high-resolution details and natural expressions.

🌟 Key Capabilities

End-to-End Lip Synchronization

Transform any talking-head clip into a lip-synced video:

  • Takes a source video plus target audio as input
  • Generates frame-accurate mouth movements without 3D meshes or 2D landmarks
  • Preserves identity, pose, background and global scene structure

High-Resolution Talking Heads

Built on latent diffusion to deliver:

  • Sharp, detailed faces at high resolution
  • Natural facial expressions and subtle mouth shapes
  • Works for both real and stylized (e.g., anime) characters from the reference video

Temporal Consistency

LatentSync introduces Temporal REPresentation Alignment (TREPA) to:

  • Reduce flicker, jitter and frame-to-frame artifacts
  • Keep head pose, lips and jaw motion stable over long sequences
  • Maintain smooth, coherent motion at video frame rates

Multilingual & Robust

Designed for real-world content:

  • Supports multiple languages and accents
  • Robust to different speakers and recording conditions
  • Handles a variety of video styles and camera setups

🎬 Core Features

  • Audio-Conditioned Latent Diffusion — Directly models audio–visual correlations in the latent space for efficient, high-quality generations.
  • TREPA Temporal Alignment — Uses temporal representations to enforce consistency across frames.
  • Improved Lip-Sync Supervision — Refined training strategies for better lip–audio alignment on standard benchmarks.
  • Resolution Flexibility — Supports HD talking-head synthesis with controllable output resolution and frame rate.
  • Open-Source Ecosystem — Public code, checkpoints and simple CLI/GUI tools for quick integration into your pipeline.

🚀 How to Use

  1. Prepare Source Video Provide a clear talking-head clip (.mp4) of the identity you want to animate. Please at least upload a video with resolution higher than 480p. Higher resolutions (720p, 1080p and 4k) are recommended.
  • Face should be visible and mostly unobstructed
  • Stable framing (minimal extreme motion) works best
  1. Provide Target Audio Upload the speech you want the subject to say (e.g., .wav, .mp3).
  • Use clean audio with minimal background noise
  • Trim leading/trailing silence if possible
  1. Run Inference The system will generate a lip-synced talking-head video aligned with your audio.

💰 Pricing

Minimum price: $0.15,

  • If the audio is less than 5 seconds. The minimum price will be $0.15
  • And the price will adapted based on the duration of input audio

💡 Pro Tips

  • Use high-quality, well-lit source videos with a clear view of the mouth.
  • Keep audio clean and dry — avoid heavy music, echo, and strong background noise.
  • For long speeches, consider segmenting audio into shorter chunks to improve stability and resource usage.
  • Match the frame rate of the output video to your target platform (e.g., 24/25/30 FPS).
  • If you encounter artifacts, try:
  • Slightly lowering resolution
  • Increasing sampling steps
  • Choosing a video segment where the head is more stable
Erişilebilirlik:Bu web sitesi, üçüncü taraflarca sağlanan yapay zeka modellerini kullanmaktadır.

Latentsync API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/wavespeed-ai/latentsync with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Latentsync below.

HTTP example
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/latentsync" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "audio": "https://example.com/your-audio.mp3",
    "video": "https://example.com/your-input.mp4"
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].
Node.js example
// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("wavespeed-ai/latentsync", {
        "audio": "https://example.com/your-audio.mp3",
        "video": "https://example.com/your-input.mp4"
});

console.log(result.outputs[0]); // → URL of the generated output
Python example
# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "wavespeed-ai/latentsync",
    {
    "audio": "https://example.com/your-audio.mp3",
    "video": "https://example.com/your-input.mp4"
}
)

print(output["outputs"][0])  # → URL of the generated output

Latentsync API — Frequently asked questions

What is the Latentsync API?

Latentsync is a WaveSpeedAI model for talking-avatar generation, exposed as a REST API on WaveSpeedAI. LatentSync synchronizes video and audio inputs to generate seamless synchronized content. Perfect for lip-syncing, audio dubbing, and video-audio alignment tasks. You can call it programmatically or try it from the playground above.

How do I call the Latentsync API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/wavespeed-ai/latentsync.

How much does Latentsync cost per run?

Latentsync starts at $0.050 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Latentsync accept?

Key inputs: `video`, `audio`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/wavespeed-ai/latentsync.

How long does Latentsync take to generate?

Average end-to-end generation time on WaveSpeedAI is around 208 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.

Can I use Latentsync outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (WaveSpeedAI). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.