← Блог

Эта статья пока недоступна на вашем языке. Показана английская версия.

LTX 2.3 GGUF: Local Audio-Video Workflow

Plan a local LTX 2.3 GGUF workflow with ComfyUI-GGUF, Hugging Face, and community quantized models while managing support and license risk.

By Dora 10 min read
LTX 2.3 GGUF: Local Audio-Video Workflow

I’ve been getting the same DM twice a week: “*Where do I download ​LTX 2.3 GGUF?​” People search, find two community pages on Hugging Face, then pause — neither is from Lightricks. That hesitation is correct. Both pages are real, the community has been actively maintaining them, but the support, licensing, and update cadence aren’t the same as an official release.

LTX 2.3 GGUF is a set of community quantizations of Lightricks’ LTX-2.3 audio-video model. The upstream weights are open but full-precision. The GGUF versions are repackaged for lower-VRAM local inference. This piece documents where the files come from, how to run them in ComfyUI or via a local launcher, and where I’d stop relying on local inference and switch to hosted execution.

Where LTX 2.3 GGUF comes from

The community quantization landscape narrowed to two main maintainers in early 2026. Both publish on the same platform, both follow the same upstream — Lightricks’ LTX-2.3 checkpoints — but they take slightly different approaches.

Community quantization: QuantStack and Unsloth

QuantStack’s LTX-2.3-GGUF Hugging Face page is a direct conversion of the upstream weights. It ships Q2_K through Q8_0 variants of the distilled and full 22B versions. Straightforward. If you want the smallest viable file, this is where you go.

Unsloth’s LTX-2.3-GGUF page uses what they call Dynamic 2.0 methodology — important layers are kept in higher precision, the rest are aggressively quantized. The repo carries both dev and distilled sets, plus their own example workflow files. The model card credits city96’s ComfyUI-GGUF tooling, which is the same node pack you’ll need either way.

I haven’t run a side-by-side long enough to publish numbers on which produces better outputs at a given quant level. That’s a different project.

Official Lightricks weights vs community GGUF builds

Lightricks publishes the original LTX-2.3 weights themselves — full-precision safetensors, official inference pipelines, official ComfyUI nodes (the ComfyUI-LTXVideo pack, separate from GGUF). They also publish camera-control extensions like LTX Director that depend on the original weight format. Those features either don’t work or work imperfectly against GGUF builds. That’s a real loss, depending on what you’re doing.

GGUF versions trade upstream feature parity for VRAM headroom. That’s the entire deal. If you need every feature Lightricks ships, run the full weights. If your machine can’t, GGUF is the trade.

Why unofficial release status matters for support and license review

The thing the search results don’t say clearly: QuantStack and Unsloth are community contributors. They are not Lightricks. If something breaks, you’re filing an issue on a community repo, not getting vendor support. The license on both community pages is ltx-2-community-license-agreement — the same restrictions on commercial use that apply to the original weights still apply to the quantized versions. Quantization doesn’t strip licensing.

Worth slowing down on this one. Treat license review as a real step, not a checkbox.

Local setup paths for builders

There are roughly three ways to run these GGUF builds locally. They’re not equivalent. They’re for different audiences.

Hugging Face model access

Both QuantStack and Unsloth’s files live on Hugging Face. You can pull them with git lfs clone or via huggingface-cli download. If you just want the smallest viable file, grab one of the mid-range variants — names follow standard llama.cpp conventions (Q3_K_M, Q4_K_S, Q4_K_M, etc.). Pick one, download, move on.

What the platform doesn’t give you is a runtime. Just the files.

ComfyUI + city96 ComfyUI-GGUF node

This is the path I see most people end up on. The node pack at city96’s ComfyUI-GGUF repository extends ComfyUI to load GGUF UNet models. Install it under ComfyUI/custom_nodes, drop the GGUF file into ComfyUI/models/unet, restart ComfyUI, and the GGUF Unet loader appears in the bootleg category. From there you wire it into a video generation workflow the same way you’d wire a normal UNet.

Worth flagging: city96’s node was written before LTX-2 existed. It handles GGUF loading generically. Whether a specific LTX-2.3 GGUF file works end-to-end depends on the workflow and on the text encoder and VAE files it expects alongside the main model. Both community pages publish example workflows for this reason — start from theirs.

Pinokio or local launcher workflow as a secondary path

Pinokio’s open-source launcher packages AI apps with one-click install, handling Python environments, dependencies, and model downloads behind a graphical interface. It’s not a ComfyUI replacement. It’s a way to skip manual setup if a script for your target app already exists in its directory.

For these quantized models specifically, Pinokio’s value depends on whether there’s a maintained script that targets the current release. Check before assuming. If you’re already in ComfyUI, the launcher doesn’t add much. If you’re starting from zero on a Windows machine without a Python setup, it removes hours of pain.

How to evaluate GGUF variants

Picking a quant level isn’t just “smaller = lower quality.” The trade-offs aren’t linear, and they shift by model.

Quantization choices such as Q4KM and similar variants

Q4_K_M is a common starting point because it sits in the middle of the standard llama.cpp range — small enough to fit consumer GPUs, large enough to preserve most of the original behavior. Q3 variants get you into smaller VRAM envelopes but quality drop becomes visible in fine detail. Q8 keeps more of the original but the file gets large enough that you’ve partly defeated the point of running GGUF locally.

I default to Q4_K_M for first runs. If outputs look acceptable, I stay. If they don’t, I go up before going down.

Prompt, seed, and output logging

Test runs without seed control aren’t tests. They’re guesses. Lock the seed, write the prompt to a file, save the output filename with both. When you swap quant levels or workflows, you’ll want to compare like-for-like, and “I think the Q4 version looked worse” doesn’t help if you can’t reproduce the comparison.

I keep a flat CSV: prompt, seed, quant level, workflow file, output path, one-line judgment. Boring. Effective.

Audio-video sync checks

LTX-2 generates synchronized audio and video in one model — that’s the headline feature. The sync is what GGUF quantization is most likely to degrade subtly, because quantization affects all layers including those handling audio-visual alignment. Watch outputs end-to-end, not just the first 1-2 seconds. Lip flap drifting against the audio track by a fraction of a second is the failure mode I’ve seen most.

This is where my data ends. I haven’t run controlled drift measurements, and I’d be wary of anyone publishing them without showing methodology.

Avoid hardcoding hardware claims without test context

You’ll see Reddit threads claiming “Q4_K_M runs at X tokens/sec on a 3090” or “12GB VRAM is enough.” Don’t take those as portable. They’re a single data point on a single workflow with unstated batch sizes, resolutions, and frame counts. Test on your hardware, with your workflow, and write down what you measured.

Production trade-offs of local inference

Running these models locally is fine for experimentation. The question is whether it scales to production. The answer is sometimes.

Local control and privacy

The case for local is real. Prompts stay on your machine. Outputs stay on your machine. No usage telemetry, no rate limits, no monthly billing surprise. For workflows involving sensitive client material or pre-release IP, that’s not a small consideration.

Maintenance, driver, and dependency risk

The case against local is also real, and it shows up later. ComfyUI updates can break custom node compatibility. CUDA driver upgrades can break PyTorch. A Windows update can move file paths. The local stack you got working on Tuesday might not work on Friday. That’s not a software quality issue — it’s the cost of running a research-grade stack outside a managed environment.

For solo work this is annoying. For team production it becomes a part-time job nobody asked for.

When hosted inference is safer

There’s a usage threshold above which running LTX-2.3 — quantized or not — on your own hardware stops making sense. The signals: you’re generating multiple videos per day, you need consistent output across team members on different machines, or you need throughput that doesn’t depend on whether last night’s driver update broke ComfyUI. Past that point, hosted inference — where someone else manages the GPU, the model files, and the dependency stack — usually wins.

Hosted has its own trade-offs: data leaves your machine, cost per generation is metered, model choice is whatever the provider supports. But the maintenance burden goes to zero, which for production teams is usually the right trade.

Adjacent GGUF searches to handle carefully

If you’ve been searching for LTX 2.3 GGUF, you’ve probably also seen Sulphur 2 GGUF appear in the same results. They’re not the same thing.

Why Sulphur 2 GGUF is likely a separate intent

Sulphur 2 GGUF is a community fine-tune of LTX-2.3 distributed through Civitai rather than the maintainers above, targeting NSFW content with its own custom node dependency (smthemex/ComfyUI_LTX2_SM, not city96’s pack). Different model, different workflow, different audience. If you arrived here looking for that, you’re in the wrong article.

When to split GGUF comparison into another article

I’d write Sulphur 2 GGUF up separately. The audience, license review, and runtime setup are different enough that mixing the comparison would dilute both pieces. To be verified — I haven’t tested it personally, and any future write-up would start with that disclosure.

FAQ

Is the LTX 2.3 GGUF an official Lightricks release?

No. The term refers to community-maintained quantizations published by QuantStack and Unsloth on Hugging Face. They’re direct conversions of Lightricks’ upstream LTX-2.3 weights, but Lightricks itself publishes only full-precision checkpoints. Please refer to the official Lightricks documentation for the current state of any direct GGUF release.

How do I run LTX 2.3 GGUF in ComfyUI?

Install city96’s ComfyUI-GGUF node into ComfyUI/custom_nodes, drop the GGUF file into ComfyUI/models/unet, restart ComfyUI, and use the GGUF Unet loader in the bootleg category. You’ll also need the matching text encoder and VAE files referenced in whichever community workflow you’re following. The Unsloth and QuantStack pages both ship example workflows worth starting from.

What are the risks of using community quantized models?

Three main ones. No vendor support if something breaks — you’re on community issue trackers. License review stays your responsibility: the LTX-2 community license still applies, and the official license terms are published in Lightricks’ LTX-2 repository. And feature gaps versus the official weights — extensions like LTX Director or new official pipeline updates may not work cleanly with GGUF builds. Please refer to Lightricks’ latest documentation for the current state of official feature parity.

Should I use Pinokio, Hugging Face, ComfyUI, or hosted inference?

Depends on what you’re doing. Pinokio for skipping setup if a script for your target app exists. Hugging Face for grabbing files directly. ComfyUI with the city96 GGUF node for actually running and tuning workflows. Hosted inference when local maintenance overhead exceeds the value of keeping execution on your own machine. The boundary is usually whether you’re shipping outputs to anyone besides yourself.

Previous posts:

Поделиться