Browse ModelsWavespeed AILtx 2 19b Lipsync

Ltx 2 19b Lipsync

Ltx 2 19b Lipsync

Playground

Try it on WavespeedAI!

LTX-2 Lipsync generates synchronized talking head videos from a reference image and audio input. Powered by the 19B DiT architecture, it produces high-fidelity lip-synced videos with natural head movements. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

LTX-2 19B Lipsync

LTX-2 Lipsync is an audio-driven digital human model that generates synchronized talking head videos from a reference image and audio input. Powered by the 19B-parameter DiT (Diffusion Transformer) architecture, it produces high-fidelity lip-synced videos with natural head movements and expressions.


Why Choose This?

  • Audio-driven generation Simply provide an audio file and optional reference image — the model handles lip-sync, head motion, and expressions automatically.

  • High-fidelity output Leverages the 19B-parameter LTX-2 architecture for detailed, temporally consistent video with natural mouth movements.

  • Flexible resolution Supports 480p, 720p, and 1080p outputs to balance quality and cost.

  • Variable duration Video length is automatically determined by audio duration (5-20 seconds max).


Parameters

ParameterRequiredDescription
audioYesAudio file URL for lip-sync (determines video length)
imageNoReference portrait image (JPG or PNG)
promptNoOptional text to guide generation style
resolutionNoOutput resolution: 480p, 720p (default), or 1080p
seedNoRandom seed for reproducibility (-1 for random)

Resolution Options

ResolutionBest For
480pFast previews, iteration, lowest cost
720pBalanced quality and cost (default)
1080pFinal delivery, maximum detail

How to Use

  1. Upload your audio — the audio file that drives lip-sync and determines video duration.
  2. Upload your image (optional) — the reference portrait that defines the speaker’s appearance.
  3. Write your prompt (optional) — describe any style or motion preferences.
  4. Select resolution — 480p for iteration, 720p for balance, 1080p for final output.
  5. Run — submit and download the lip-synced video.

Pricing

Resolution5s10s15s20s
480p$0.075$0.15$0.225$0.30
720p$0.10$0.20$0.30$0.40
1080p$0.15$0.30$0.45$0.60

Billing Rules

  • Base price: $0.10 (720p, 5 seconds)
  • Duration: Determined by audio length (min 5s, max 20s billing)
  • Resolution multiplier: 480p = 0.75×, 720p = 1×, 1080p = 1.5×
  • Total cost = duration × $0.10 × resolution_multiplier / 5

Best Use Cases

  • Digital Avatars — Create talking head videos for virtual presenters and avatars.
  • Content Creation — Generate lip-synced videos for social media and marketing.
  • Localization — Dub existing content with new audio while maintaining visual consistency.
  • Accessibility — Create sign language or narrated content with synchronized visuals.
  • Education — Produce instructional videos with talking head presenters.

Pro Tips

  • Use clear, high-quality audio for best lip-sync results.
  • Provide a front-facing portrait image with visible mouth for optimal generation.
  • Use high-quality, sharp, well-lit portrait images for best results.
  • Iterate at 480p to dial in results, then render at higher resolution for final output.
  • Use fixed seed when comparing variations to isolate changes.

Notes

  • Maximum video duration is 20 seconds (determined by audio length).
  • Audio longer than 20 seconds will be truncated.
  • The aspect ratio of output video is influenced by your input image.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/ltx-2-19b/lipsync" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "resolution": "720p",
    "seed": -1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
audiostringYes--The audio file URL for lip-sync generation. Duration determines video length (5-20 seconds max).
imagestringNo-The reference image for the generation. Optional - if not provided, a default portrait will be used.
promptstringNo-Optional text prompt to guide the generation style and motion.
resolutionstringNo720p480p, 720p, 1080pVideo resolution.
seedintegerNo-1-1 ~ 2147483647The random seed to use for the generation. -1 means a random seed will be used.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

ParameterTypeRequiredDefaultDescription
idstringYes-Task ID

Result Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
dataobjectThe prediction data object containing all details
data.idstringUnique identifier for the prediction, the ID of the prediction to get
data.modelstringModel ID used for the prediction
data.outputsstringArray of URLs to the generated content (empty when status is not completed).
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds
© 2025 WaveSpeedAI. All rights reserved.