Introducing WaveSpeedAI LTX 2.3 Image-to-Video on WaveSpeedAI

Bring Your Images to Life with LTX-2.3 Image-to-Video on WaveSpeedAI

Static images tell a story. Moving images with sound make audiences feel it. With LTX-2.3 Image-to-Video now available on WaveSpeedAI, you can transform any still image into a high-fidelity video — complete with synchronized audio — in a single generation pass. No post-production. No separate audio tools. Just upload, prompt, and play.

Built by Lightricks on the Diffusion Transformer (DiT) architecture, LTX-2.3 represents a leap forward in unified audio-video generation. Where most image-to-video models produce silent clips that require separate sound design, LTX-2.3 generates motion and audio together as one coherent output. The result is animated content that feels whole from the first frame.

What Is LTX-2.3?

LTX-2.3 is the latest iteration of the LTX-2 model family — a 19-billion-parameter foundation model split roughly into 14 billion parameters for video processing and 5 billion for audio. It is one of the first open-source models capable of generating synchronized audio and video within a single unified architecture, using cross-attention mechanisms to keep sound and motion perfectly aligned.

The “2.3” release introduces meaningful improvements over its predecessor: a rebuilt VAE (Variational Autoencoder) trained on higher-quality data, an upgraded HiFi-GAN vocoder for cleaner audio output, stronger image-to-video consistency, and better prompt adherence throughout the generation pipeline.

Key Features

Synchronized Audio-Video Generation: Sound isn’t bolted on as an afterthought. Ambient noise, music, dialogue cues, and sound effects are generated alongside visual motion in a single pass, eliminating the need for separate audio workflows.
New VAE for Sharper Details: The rebuilt latent space in LTX-2.3 preserves fine textures, facial features, hair, text, and edge detail across the full frame. Outputs are visibly sharper than previous versions.
Cleaner Audio Output: An improved HiFi-GAN vocoder reduces noise artifacts and silence gaps. Dialogue, ambient sound, and music come through with noticeably greater clarity.
Faithful Image Preservation: The model maintains the subject, composition, framing, and lighting of your reference image while adding natural, coherent motion — no identity drift or visual degradation.
Flexible Resolution and Duration: Generate video at 480p, 720p, or 1080p, with durations ranging from 5 to 20 seconds, letting you balance quality, cost, and creative needs.
Portrait and Landscape Support: Native 9:16 portrait mode makes it easy to produce content optimized for social platforms like Instagram Reels, TikTok, and YouTube Shorts.
24/48 FPS Options: Choose the frame rate that matches your output requirements, from standard playback to smoother high-frame-rate delivery.

Real-World Use Cases

Product Marketing

Turn product photography into dynamic showcase videos. Upload a hero shot of a sneaker, a skincare bottle, or a piece of furniture, and LTX-2.3 animates it with subtle motion — a rotating view, shifting lighting, environmental atmosphere — while generating matching ambient audio. What once required a videographer and sound designer can now be drafted in seconds.

The demand for short-form video is relentless. LTX-2.3 lets creators convert their strongest still images into scroll-stopping animated posts with built-in sound. A landscape photograph becomes a cinematic moment with wind and birdsong. A food photo becomes a sizzling, steaming clip ready to post.

Portrait and Character Animation

Animate headshots, portraits, and character artwork with natural movement. The model excels at preserving facial identity while adding lifelike motion — subtle head turns, blinking, expression changes — making it valuable for digital avatars, creative projects, and personalized content.

Storyboarding and Pre-Visualization

For filmmakers and creative directors, LTX-2.3 transforms static storyboard frames and concept art into animated sequences with synchronized audio. This accelerates pre-production by giving stakeholders a tangible feel for pacing, mood, and sound design before a single frame is shot.

E-Commerce and Advertising

Static product listings lose attention. Animated product videos with ambient sound increase engagement and conversion rates. LTX-2.3 makes it practical to generate video assets at scale — iterate quickly at 480p, then render final assets at 1080p.

Getting Started on WaveSpeedAI

Running LTX-2.3 Image-to-Video on WaveSpeedAI is straightforward. With no cold starts and fast inference, you get results in seconds rather than minutes.

import wavespeed

output = wavespeed.run(
    "wavespeed-ai/ltx-2.3/image-to-video",
    {
        "image": "https://your-image-url.com/photo.jpg",
        "prompt": "The camera slowly pushes in as the subject turns their head, soft ambient music playing"
    },
)

print(output["outputs"][0])  # Output video URL

You can also specify resolution and duration:

output = wavespeed.run(
    "wavespeed-ai/ltx-2.3/image-to-video",
    {
        "image": "https://your-image-url.com/product.jpg",
        "prompt": "Gentle rotation revealing product details, soft studio lighting, subtle ambient hum",
        "resolution": "1080p",
        "duration": 10
    },
)

Pro tip: Start with 480p and short durations to dial in your prompt and motion direction. Once you have the result you want, scale up to 1080p for final delivery. Use a fixed seed when comparing prompt variations so you can isolate exactly what changed.

Pricing

LTX-2.3 on WaveSpeedAI starts at just $0.10 for a 5-second clip at 480p, scaling up to $0.80 for a 20-second 1080p video. No subscriptions required — pay only for what you generate.

Resolution	5s	10s	15s	20s
480p	$0.10	$0.20	$0.30	$0.40
720p	$0.15	$0.30	$0.45	$0.60
1080p	$0.20	$0.40	$0.60	$0.80

Why WaveSpeedAI?

In a landscape where synchronized audio-video generation is rapidly becoming the standard — with models like Veo 3.1, Kling 3.0, and Sora 2 all pushing the boundaries — LTX-2.3 stands out as a powerful open-source option with production-grade quality. And running it on WaveSpeedAI gives you the infrastructure to match: fast inference with no cold starts, simple API integration, and pricing that makes experimentation affordable.

Whether you’re a solo creator animating social content or a team generating video assets at scale, the combination of LTX-2.3’s unified audio-video generation and WaveSpeedAI’s optimized infrastructure means less time waiting and more time creating.

Start Creating

The gap between a still image and a complete video with sound has never been smaller. Try LTX-2.3 Image-to-Video on WaveSpeedAI today and see what your images sound like in motion.