Introducing WaveSpeedAI LTX 2.3 Text-to-Video LoRA on WaveSpeedAI

Introducing LTX-2.3 Text-to-Video with LoRA Support on WaveSpeedAI

The line between imagination and video has never been thinner. Today, we’re excited to announce the availability of LTX-2.3 Text-to-Video with LoRA support on WaveSpeedAI — a model that doesn’t just generate video from text, but lets you shape it to your vision with custom styles, characters, and motion through lightweight LoRA adapters.

Whether you’re building a brand identity, animating a recurring character, or crafting content with a signature cinematic look, LTX-2.3 with LoRA gives you the control that generic video generation models simply can’t match.

What Is LTX-2.3 Text-to-Video LoRA?

LTX-2.3 is the latest evolution of Lightricks’ LTX model family — a Diffusion Transformer (DiT) based foundation model that generates synchronized video and audio from a single text prompt in one pass. No separate audio production pipeline. No post-processing workarounds. You describe a scene, and you get both the visuals and the sound.

What makes this release particularly powerful is the addition of LoRA (Low-Rank Adaptation) support. LoRA adapters are lightweight, trainable modules that sit on top of the base model and steer its output toward specific styles, characters, or motion patterns. You can stack up to three LoRA adapters simultaneously, blending custom aesthetics with the full generative power of LTX-2.3.

The result: a model that’s both general-purpose and deeply customizable.

Key Features

Upgraded Visual and Audio Quality

LTX-2.3 ships with a completely redesigned VAE (Variational Autoencoder) trained on higher-quality data. Fine textures, hair, text overlays, and edge details are sharper and more realistic than in previous versions. On the audio side, the training data has been filtered for silence gaps, noise, and artifacts, and a new vocoder delivers cleaner, more reliable sound with tighter alignment to the visual content.

Enhanced Prompt Adherence

A new gated attention text connector means your prompts are followed more faithfully. Descriptions of timing, motion, expression, and audio cues translate directly into the generated output — reducing the gap between what you write and what you see.

LoRA Customization

Apply up to three LoRA adapters per generation, each with adjustable scale. This lets you:

Lock in a visual style — cinematic looks, anime aesthetics, brand color palettes
Maintain character consistency — recurring faces, figures, or mascots across clips
Train custom motion patterns — signature movements, camera techniques, choreography
Combine adapters — layer a character LoRA with a style LoRA and a motion LoRA in a single generation

Flexible Output Options

Resolutions: 480p for fast iteration, 720p for balanced quality, 1080p for final delivery
Duration: Generate clips from 5 to 20 seconds
Synchronized audio: Sound is generated alongside video in a single model pass, with the ability to guide audio through prompt cues like “rain on a window,” “upbeat jazz,” or “crowd cheering”

Transparent, Predictable Pricing

Every generation has a clear cost based on resolution and duration:

Resolution	5s	10s	15s	20s
480p	$0.15	$0.30	$0.45	$0.60
720p	$0.20	$0.40	$0.60	$0.80
1080p	$0.25	$0.50	$0.75	$1.00

No surprises. No hidden compute charges.

Real-World Use Cases

Brand Content at Scale

Marketing teams can train a LoRA on their brand’s visual identity — logo treatments, color palettes, motion graphics style — and then generate on-brand video content from text descriptions alone. Need 20 variations of a product reveal? Write the prompts, apply the brand LoRA, and generate.

Character-Driven Storytelling

Creators building series or campaigns around a specific character can train a likeness LoRA from reference clips. Every new video maintains the same character appearance, making episodic content and social media series visually consistent without manual editing.

The 5-to-20-second duration range maps perfectly to short-form content for TikTok, Instagram Reels, and YouTube Shorts. Generate scroll-stopping clips with synchronized audio directly from a creative brief, then iterate at 480p before rendering the final version at 1080p.

Rapid Prototyping and Concept Visualization

Agencies and studios can use text-to-video generation to quickly visualize concepts for client presentations. Describe the scene, apply a cinematic style LoRA, and produce a polished preview in minutes instead of days.

Motion Design and VFX Exploration

Train LoRAs on specific camera movements — tracking shots, dolly zooms, smooth pans — and apply them to any scene. This gives motion designers a starting point that already matches their intended cinematic language.

Getting Started on WaveSpeedAI

Generating your first video takes just a few lines of code:

import wavespeed

output = wavespeed.run(
    "wavespeed-ai/ltx-2.3/text-to-video-lora",
    {
        "prompt": "A lone astronaut walks across a crimson desert under twin suns, wind howling across the dunes, cinematic tracking shot",
        "loras": [
            {"path": "your-style-lora-url", "scale": 0.8}
        ],
        "resolution": "720p",
        "duration": 10,
    },
)

print(output["outputs"][0])

Running on WaveSpeedAI means no cold starts — your request hits a warm GPU and starts generating immediately. Combined with affordable per-generation pricing and a straightforward REST API, you can integrate video generation into production workflows without infrastructure overhead.

Pro Tips for Best Results

Iterate cheaply: Start at 480p to refine your prompt and LoRA combination, then render the final version at 1080p
Be specific with audio: Include audio cues in your prompt — “soft piano music,” “waves crashing,” “footsteps on gravel” — for more intentional soundscapes
Use fixed seeds: When comparing prompt variations or LoRA scales, lock the seed to isolate what’s actually changing
Stack LoRAs strategically: Combine a style adapter with a motion adapter for results that neither could achieve alone, adjusting the scale of each to find the right balance

The Bigger Picture

AI video generation has crossed a threshold in 2026. What was once a novelty producing blurry, seconds-long clips has matured into a production-ready tool capable of cinematic-quality output with coherent motion and synchronized audio. LTX-2.3 with LoRA support represents the next step in that evolution: not just better base quality, but the ability to make the model yours.

Custom LoRAs turn a general-purpose video model into a specialized creative tool that understands your brand, your characters, and your aesthetic. That’s the difference between generating generic content and generating your content.

Start Creating Today

LTX-2.3 Text-to-Video with LoRA support is available now on WaveSpeedAI. Head to the model page to explore the API, run your first generation, and see what’s possible when you combine state-of-the-art video generation with the precision of custom LoRA adapters.

Your text. Your style. Your video.