
digital-human
Idle
このリクエストには1回あたりで$0.1の費用がかかります。
$1でおよそ10回実行できます。
LTX-2.3 Lipsync is an advanced AI model that generates talking head videos from audio and an optional reference image. Built on the LTX-2.3 DiT-based architecture with improved audio-visual quality, it creates realistic lip-synced videos that match your audio input.
Improved quality Enhanced audio-visual alignment with better lip sync accuracy and natural facial movements.
Audio-driven generation Automatically generates video with synchronized lip movements from audio input.
Optional reference image Provide a portrait image to use as the base, or let the model use a default portrait.
Flexible resolution Supports 480p, 720p, and 1080p outputs to balance quality and cost.
Automatic duration Video length automatically matches audio duration (5-20 seconds).
| Parameter | Required | Description |
|---|---|---|
| audio | Yes | Audio file URL - duration determines video length (5-20s) |
| image | No | Reference portrait image (optional) |
| prompt | No | Text prompt to guide generation style and motion |
| resolution | No | Output resolution: 480p, 720p (default), or 1080p |
| seed | No | Random seed for reproducibility (-1 for random) |
| Resolution | Best For |
|---|---|
| 480p | Fast previews, iteration, lowest cost |
| 720p | Balanced quality and cost (default) |
| 1080p | Final delivery, maximum detail |
Pricing is based on audio duration (automatically detected):
| Resolution | 5s | 10s | 15s | 20s |
|---|---|---|---|---|
| 480p | $0.10 | $0.20 | $0.30 | $0.40 |
| 720p | $0.15 | $0.30 | $0.45 | $0.60 |
| 1080p | $0.20 | $0.40 | $0.60 | $0.80 |