Introducing WaveSpeedAI LTX 2 19b Control on WaveSpeedAI

Introducing LTX-2 19B ControlNet: Precision Video-to-Video Transformation with Pose, Depth, and Edge Guidance

The landscape of AI video generation has reached a new milestone. LTX-2 19B ControlNet brings the power of structural guidance to video transformation, enabling creators to reshape video content while preserving the motion and dynamics that make footage compelling. Built on Lightricks’ groundbreaking 19-billion parameter Diffusion Transformer architecture, this model represents a significant leap forward in controlled video generation.

What is LTX-2 19B ControlNet?

LTX-2 19B ControlNet is a video-to-video transformation model that uses pose, depth, or canny edge detection to guide the generation of new video content while maintaining the motion structure from your input. The model operates on the same powerful foundation as the LTX-2 family—an asymmetric dual-stream diffusion transformer with 48 layers that processes both video and audio tokens simultaneously.

What sets this model apart is its ability to generate synchronized audio-video content up to 20 seconds in length. The architecture splits its 19 billion parameters strategically: approximately 14 billion for video processing and 5 billion for audio, enabling coherent multimodal output in a single pass.

The ControlNet integration allows you to choose exactly how the model interprets your source video. Whether you want to preserve human motion through pose detection, maintain scene structure through depth mapping, or follow precise edges through canny detection, you have complete control over the transformation process.

Key Features

Three Guidance Modes for Every Use Case

Pose Mode: Extracts skeletal and pose information from your input video, ideal for human and character motion transfer. This mode reliably tracks body positioning across frames, making it perfect for dance sequences, athletic movements, or any content where human motion is the focus.
Depth Mode: Creates depth maps from your source video to preserve scene structure and spatial relationships. Use this when you want to transform environments, change visual styles, or apply creative effects while maintaining the fundamental geometry of your footage.
Canny Edge Mode: Detects edges in your source material to guide generation while preserving shapes and outlines. This mode excels at style transfer applications where you need to maintain precise visual boundaries.

Flexible Audio Handling

The model offers three audio modes to match your creative needs:

Preserve: Keep the original audio track from your input video—essential for lip-sync scenarios
Generate: Create new synchronized audio that matches the transformed visuals
None: Output silent video for projects where you’ll add audio separately

Reference Image Integration

Upload a reference image to define the appearance of your transformed video. The model will apply the visual characteristics of your reference while the input video controls all motion. This enables powerful character-driven transformations where you can animate any character image with motion from reference footage.

Built-in Prompt Enhancement

The integrated prompt enhancer automatically improves your text descriptions for better results. Combined with the model’s Gemma-3 text encoder, which understands nuanced language cues including character emotions, camera movements, and lighting directions, this feature helps you achieve professional results without extensive prompt engineering.

Real-World Use Cases

Character Animation and Motion Transfer

Transform a static character image into a fully animated video by applying motion from reference footage. Whether you’re working with illustrated characters, photographs, or digital avatars, the pose guidance mode captures movement accurately while the reference image defines the visual output.

Dance Transfer for Social Media

Create engaging content by transferring viral dance moves to any subject. The pose mode tracks body positioning frame by frame, allowing you to transform dance videos into stylized animations—perfect for TikTok, Instagram Reels, and YouTube Shorts content.

Video Style Transfer

Apply dramatic visual transformations to existing footage while preserving the original motion. Use depth mode to maintain scene structure as you change visual styles, or canny edge mode when precise shape preservation matters most.

Character Consistency in Video Production

For creators working on series content or branded videos, the reference image feature ensures consistent character appearance across multiple clips. Motion can come from different source videos while the character appearance remains uniform.

Lip-Sync Video Creation

Preserve original audio while transforming the visual appearance of your subject. This workflow is particularly valuable for creating dubbed content, animated versions of live footage, or privacy-preserving video modifications.

Getting Started on WaveSpeedAI

Using LTX-2 19B ControlNet on WaveSpeedAI is straightforward:

Upload your source video — This provides the motion structure for your output
Add a reference image (optional) — Define the appearance you want in your transformed video
Write your prompt — Describe what you want to create
Select your control mode — Choose pose, depth, or canny based on your needs
Choose audio handling — Preserve original, generate new, or none
Set your resolution — 480p for quick iterations, 720p for balanced quality, 1080p for final renders
Generate — Submit and download your transformed video

import wavespeed

output = wavespeed.run(
    "wavespeed-ai/ltx-2-19b/control",
    {
        "video": "https://example.com/source-video.mp4",
        "image": "https://example.com/reference.jpg",
        "prompt": "A person dancing in a futuristic neon city",
        "mode": "pose",
        "audio_mode": "generate",
        "resolution": "720p"
    },
)

print(output["outputs"][0])

Pricing

The model follows straightforward per-second pricing based on resolution:

Resolution	5s	10s	15s	20s
480p	$0.15	$0.30	$0.45	$0.60
720p	$0.20	$0.40	$0.60	$0.80
1080p	$0.30	$0.60	$0.90	$1.20

Pro Tips for Best Results

Match starting poses: Align the subject pose in your reference image with the starting pose in your source video for seamless results
Choose the right mode: Use pose for human/character motion, depth for scene structure, canny for edge-based precision
Iterate efficiently: Start at 480p to refine your approach, then render final output at 720p or 1080p
Audio strategy: Preserve audio for lip-sync projects, generate for fresh content, or use none when you’ll add audio in post

Why WaveSpeedAI?

WaveSpeedAI offers the ideal environment for running LTX-2 19B ControlNet:

No cold starts: Your jobs begin processing immediately without infrastructure delays
Optimized inference: NVIDIA-optimized deployment ensures you get the fastest possible generation times
Transparent pricing: Pay only for what you generate with clear per-second billing
Production-ready API: Integrate directly into your applications and workflows

Start Creating Today

LTX-2 19B ControlNet opens new possibilities for video creators, animators, and developers who need precise control over video transformations. The combination of ControlNet guidance modes, flexible audio handling, and the powerful 19B DiT architecture delivers professional-quality results at accessible price points.

Ready to transform your videos with precise structural guidance? Try LTX-2 19B ControlNet on WaveSpeedAI and discover what’s possible when you have full control over AI video generation.