Introducing WaveSpeedAI LTX 2.3 Image-to-Video on WaveSpeedAI
Bring Your Images to Life with LTX-2.3 Image-to-Video on WaveSpeedAI
Static images tell a story. Moving images with sound make audiences feel it. With LTX-2.3 Image-to-Video now available on WaveSpeedAI, you can transform any still image into a high-fidelity video — complete with synchronized audio — in a single generation pass. No post-production. No separate audio tools. Just upload, prompt, and play.
Built by Lightricks on the Diffusion Transformer (DiT) architecture, LTX-2.3 represents a leap forward in unified audio-video generation. Where most image-to-video models produce silent clips that require separate sound design, LTX-2.3 generates motion and audio together as one coherent output. The result is animated content that feels whole from the first frame.
What Is LTX-2.3?
LTX-2.3 is the latest iteration of the LTX-2 model family — a 19-billion-parameter foundation model split roughly into 14 billion parameters for video processing and 5 billion for audio. It is one of the first open-source models capable of generating synchronized audio and video within a single unified architecture, using cross-attention mechanisms to keep sound and motion perfectly aligned.
The “2.3” release introduces meaningful improvements over its predecessor: a rebuilt VAE (Variational Autoencoder) trained on higher-quality data, an upgraded HiFi-GAN vocoder for cleaner audio output, stronger image-to-video consistency, and better prompt adherence throughout the generation pipeline.
Key Features
-
Synchronized Audio-Video Generation: Sound isn’t bolted on as an afterthought. Ambient noise, music, dialogue cues, and sound effects are generated alongside visual motion in a single pass, eliminating the need for separate audio workflows.
-
New VAE for Sharper Details: The rebuilt latent space in LTX-2.3 preserves fine textures, facial features, hair, text, and edge detail across the full frame. Outputs are visibly sharper than previous versions.
-
Cleaner Audio Output: An improved HiFi-GAN vocoder reduces noise artifacts and silence gaps. Dialogue, ambient sound, and music come through with noticeably greater clarity.
-
Faithful Image Preservation: The model maintains the subject, composition, framing, and lighting of your reference image while adding natural, coherent motion — no identity drift or visual degradation.
-
Flexible Resolution and Duration: Generate video at 480p, 720p, or 1080p, with durations ranging from 5 to 20 seconds, letting you balance quality, cost, and creative needs.
-
Portrait and Landscape Support: Native 9:16 portrait mode makes it easy to produce content optimized for social platforms like Instagram Reels, TikTok, and YouTube Shorts.
-
24/48 FPS Options: Choose the frame rate that matches your output requirements, from standard playback to smoother high-frame-rate delivery.
Real-World Use Cases
Product Marketing
Turn product photography into dynamic showcase videos. Upload a hero shot of a sneaker, a skincare bottle, or a piece of furniture, and LTX-2.3 animates it with subtle motion — a rotating view, shifting lighting, environmental atmosphere — while generating matching ambient audio. What once required a videographer and sound designer can now be drafted in seconds.
Social Media Content
The demand for short-form video is relentless. LTX-2.3 lets creators convert their strongest still images into scroll-stopping animated posts with built-in sound. A landscape photograph becomes a cinematic moment with wind and birdsong. A food photo becomes a sizzling, steaming clip ready to post.
Portrait and Character Animation
Animate headshots, portraits, and character artwork with natural movement. The model excels at preserving facial identity while adding lifelike motion — subtle head turns, blinking, expression changes — making it valuable for digital avatars, creative projects, and personalized content.
Storyboarding and Pre-Visualization
For filmmakers and creative directors, LTX-2.3 transforms static storyboard frames and concept art into animated sequences with synchronized audio. This accelerates pre-production by giving stakeholders a tangible feel for pacing, mood, and sound design before a single frame is shot.
E-Commerce and Advertising
Static product listings lose attention. Animated product videos with ambient sound increase engagement and conversion rates. LTX-2.3 makes it practical to generate video assets at scale — iterate quickly at 480p, then render final assets at 1080p.
Getting Started on WaveSpeedAI
Running LTX-2.3 Image-to-Video on WaveSpeedAI is straightforward. With no cold starts and fast inference, you get results in seconds rather than minutes.
import wavespeed
output = wavespeed.run(
"wavespeed-ai/ltx-2.3/image-to-video",
{
"image": "https://your-image-url.com/photo.jpg",
"prompt": "The camera slowly pushes in as the subject turns their head, soft ambient music playing"
},
)
print(output["outputs"][0]) # Output video URL
You can also specify resolution and duration:
output = wavespeed.run(
"wavespeed-ai/ltx-2.3/image-to-video",
{
"image": "https://your-image-url.com/product.jpg",
"prompt": "Gentle rotation revealing product details, soft studio lighting, subtle ambient hum",
"resolution": "1080p",
"duration": 10
},
)
Pro tip: Start with 480p and short durations to dial in your prompt and motion direction. Once you have the result you want, scale up to 1080p for final delivery. Use a fixed seed when comparing prompt variations so you can isolate exactly what changed.
Pricing
LTX-2.3 on WaveSpeedAI starts at just $0.10 for a 5-second clip at 480p, scaling up to $0.80 for a 20-second 1080p video. No subscriptions required — pay only for what you generate.
| Resolution | 5s | 10s | 15s | 20s |
|---|---|---|---|---|
| 480p | $0.10 | $0.20 | $0.30 | $0.40 |
| 720p | $0.15 | $0.30 | $0.45 | $0.60 |
| 1080p | $0.20 | $0.40 | $0.60 | $0.80 |
Why WaveSpeedAI?
In a landscape where synchronized audio-video generation is rapidly becoming the standard — with models like Veo 3.1, Kling 3.0, and Sora 2 all pushing the boundaries — LTX-2.3 stands out as a powerful open-source option with production-grade quality. And running it on WaveSpeedAI gives you the infrastructure to match: fast inference with no cold starts, simple API integration, and pricing that makes experimentation affordable.
Whether you’re a solo creator animating social content or a team generating video assets at scale, the combination of LTX-2.3’s unified audio-video generation and WaveSpeedAI’s optimized infrastructure means less time waiting and more time creating.
Start Creating
The gap between a still image and a complete video with sound has never been smaller. Try LTX-2.3 Image-to-Video on WaveSpeedAI today and see what your images sound like in motion.


