LTX-2.3: The Free Open-Source AI Video Model That Runs on Your Own GPU

Open-source AI video generation on a local GPU workstation

LTX-2.3 is a free, open-source AI video model that generates 4K footage at 50FPS with native audio sync — and runs locally on a consumer GPU. No subscription, no per-clip cost, no cloud dependency. Here's what it can do, how to set it up, and when it beats Kling and Veo.

Why Open-Source AI Video Matters Now

Every major AI video tool — Kling, Veo, Seedance, Runway — operates on a credit system. You pay per clip, per second, per generation. For a small production company or independent creator producing 50+ clips a month, those credits add up fast. At CA$0.40 per Kling 4K clip, 100 clips a month costs CA$40. At Runway Gen-4 rates, it's closer to CA$200.

LTX-2.3 changes that math completely. It's a fully open-source video generation model from Lightricks — the company behind FaceTune and LTX Studio — released under a permissive commercial license. You download the model weights, run it on your own GPU, and generate as many clips as you want. The per-clip cost is electricity.

With the latest 2.3 release, LTX has closed most of the quality gap with the paid tools. Native 4K output, 50FPS, synchronized audio generation, and motion quality that's competitive with Kling 3.0 for environmental and product shots. It's not perfect — character consistency is still weaker than Kling — but for a free tool running locally, it's remarkable.

LTX-2.3 Key Specs

Output quality: - Native resolution: up to 4K (3840×2160) - Frame rate: up to 50FPS - Duration: up to 10 seconds per generation (stitchable for longer clips) - Audio: native audio generation synchronized with video - Color quality: HDR-capable, strong cinematic color grading out of box

Hardware requirements: - Minimum: NVIDIA RTX 3080 (10GB VRAM) — runs at 1080p/24fps - Recommended: RTX 4090 (24GB VRAM) — full 4K/50fps pipeline - Apple Silicon: M3 Max and M4 Pro/Max supported via MPS backend (slower, but functional) - AMD: ROCm support available but less stable than CUDA

License: Apache 2.0 — free for personal and commercial use, including client deliverables.

Generation speed (RTX 4090): - 1080p, 6 seconds: ~45 seconds - 4K, 6 seconds: ~3 minutes - 4K, 10 seconds: ~5 minutes

Compared to cloud tools (Kling API: ~15 seconds, Veo: ~20 seconds), LTX is slower per clip. But when you're running unlimited clips for free, the tradeoff is usually worth it for batch work.

How to Set Up LTX-2.3 with ComfyUI

The most practical way to run LTX-2.3 is through ComfyUI — the open-source node-based workflow tool that just hit a $500M valuation and 4 million users. ComfyUI handles model loading, prompt routing, and output management through a visual interface that doesn't require writing code.

Step 1: Install ComfyUI ``` git clone https://github.com/comfyanonymous/ComfyUI cd ComfyUI pip install -r requirements.txt ``` Or use the desktop installer at comfy.org — one-click setup for Windows and Mac.

Step 2: Download LTX-2.3 model weights Weights are hosted on Hugging Face. Total download: ~12GB. ``` # From ComfyUI/models/video_models/ huggingface-cli download Lightricks/LTX-Video ltx-video-2b-v0.9.5.safetensors ```

Step 3: Install the LTX ComfyUI node pack In ComfyUI Manager, search "LTX Video" and install. This adds the LTX generation nodes to your workflow palette.

Step 4: Load the starter workflow Download the official LTX-2.3 workflow JSON from the ComfyUI docs (docs.comfy.org/tutorials/video/ltx/ltx-2-3). Drag it into ComfyUI to get a ready-to-run text-to-video pipeline.

Step 5: Generate your first clip Type a prompt in the positive prompt node, set duration (6 seconds to start), hit Queue. First generation takes longer as the model loads into VRAM; subsequent generations are faster.

LTX vs Kling vs Veo: When to Use Each

LTX-2.3 doesn't replace paid tools — it complements them. Here's the practical breakdown:

Use LTX-2.3 for: - High-volume batch work (20+ clips/day) where per-clip cost matters - Environmental B-roll: landscapes, cityscapes, abstract motion, product environments - Experimentation and concept testing before committing to paid generation - Internal deliverables (training videos, internal comms) where client-facing quality isn't critical - Any project where you need offline/air-gapped generation (no internet required) - Looping background video for websites, digital signage, trade show displays

Use Kling 3.0 instead when: - Human subjects are the hero of the shot — character consistency and face quality are still measurably better in Kling - Speed matters more than cost — Kling API is 4-6x faster per clip - You need native 5-language lip sync (Kling's unique feature)

Use Veo 3.1 instead when: - You need the absolute best atmospheric audio quality - The clip is a one-off hero shot for a high-value deliverable - Google Workspace integration matters for your workflow

The hybrid workflow that actually works: Use LTX locally for concept testing and B-roll batch generation. When a concept tests well locally, generate the hero version in Kling or Veo for the final deliverable. You spend credits on proven concepts, not experiments.

Audio Generation in LTX-2.3

Native audio is the feature that separates LTX-2.3 from its predecessor. Earlier versions of LTX produced silent video only — audio had to be added in post. LTX-2.3 generates synchronized audio alongside the video in a single pass.

The audio engine is similar to what Veo 3.1 uses: environmental soundscapes, ambient noise, foley effects for physical interactions. A clip of rain on a city street generates the sound of rain. A product being placed on a table generates the impact sound. An outdoor market scene generates crowd ambience.

Practical quality notes: - Environmental ambient audio: excellent, competitive with Veo 3.1 - Physical interaction sounds (impacts, mechanical): good - Music: basic, better to add licensed music in post - Dialogue/voiceover: not supported — still needs separate recording

For real estate B-roll (establishing shots, neighbourhood walks, amenity showcases), LTX-2.3's audio generation eliminates a meaningful chunk of post-production time. You're not sourcing and syncing ambient audio separately for every clip.

Running LTX on Apple Silicon (M3/M4)

If you're on a MacBook Pro M3 Max or M4 Pro/Max, LTX-2.3 runs via PyTorch's MPS (Metal Performance Shaders) backend. Setup is the same as CUDA — ComfyUI auto-detects Apple Silicon and routes to MPS.

Performance vs NVIDIA RTX 4090: - M4 Max (128GB unified memory): ~2.5x slower than RTX 4090 for 1080p, ~1.8x slower for 4K (more memory helps) - M3 Max (96GB): ~3x slower than RTX 4090 for 1080p - M3 Pro (36GB): limited to 1080p; 4K will OOM on some configurations

The unified memory architecture of Apple Silicon means M4 Max can load the full 4K pipeline into memory without swapping, which partially compensates for the slower GPU throughput. For a creator who already owns a MacBook Pro, running LTX locally is free — no additional hardware investment.

For Windows users with an RTX 4080 or higher, the CUDA path is faster and more stable. NVIDIA's Tensor Core acceleration makes a meaningful difference at 4K.

Is It Worth Setting Up?

The setup time — about 1-2 hours for a clean install — pays back quickly if you're generating more than 20-30 clips per month. At that volume, the savings over Kling Premier (~CA$30/month for 375 clips) are modest, but LTX has no hard clip limit. A video production workflow generating 200+ social variants per month saves real money at scale.

For Vancouver creators specifically: LTX is excellent for the kind of atmospheric B-roll that real estate video and corporate video production requires. Establishing shots of neighbourhoods, product environment footage, abstract brand motion — these are all within LTX-2.3's current capability.

The tool is improving fast. Lightricks ships updates every 6-8 weeks. The gap between LTX and paid tools will continue to close. Getting comfortable with the ComfyUI workflow now means you're ready when the quality crosses the threshold for client-facing hero shots. For the shots that still need a real camera — drone aerials, event coverage, listing walkthroughs — that's where a Vancouver videographer stays essential.

LTX VideoOpen Source AIAI Video GenerationLocal AI

Frequently Asked Questions

Is LTX-2.3 actually free to use commercially?

Yes. LTX-2.3 is released under the Apache 2.0 license, which allows commercial use including client deliverables. You download the model weights and run them locally — there are no usage fees, no per-clip charges, and no subscription required. Your only cost is the electricity to run your GPU.

What GPU do I need to run LTX-2.3?

Minimum: NVIDIA RTX 3080 with 10GB VRAM for 1080p output. For 4K at 50FPS, an RTX 4090 (24GB VRAM) is recommended. Apple Silicon M3 Max and M4 Pro/Max are supported via the MPS backend and can run the full 4K pipeline. AMD GPUs work via ROCm but with less stable support.

How does LTX-2.3 compare to Kling 3.0 in quality?

For environmental B-roll, product shots, and abstract footage, LTX-2.3 is competitive with Kling 3.0 and occasionally better on motion smoothness. For human subjects and face consistency, Kling 3.0 is still noticeably ahead. The practical approach is to use LTX for environments and B-roll, and Kling for any shot featuring people.

Can I use LTX-2.3 without ComfyUI?

Yes. LTX-2.3 can be run via Python scripts directly (Diffusers library), via the official LTX Studio web app (cloud-hosted, paid), or via AUTOMATIC1111/Forge with the appropriate extension. ComfyUI is the most flexible option for custom workflows. LTX Studio is the easiest option for users who don't want to manage local installs.

Does LTX-2.3 work offline without internet?

Yes, once the model weights are downloaded. The entire generation pipeline runs locally on your machine with no internet connection required. This makes it suitable for air-gapped production environments, client sites without reliable internet, or workflows where cloud data dependency is a concern.

How long does it take to generate a clip on a Mac?

On M4 Max (128GB): approximately 4-6 minutes for a 6-second 4K clip. On M3 Max (96GB): approximately 6-8 minutes. On M4 Pro (48GB): approximately 8-12 minutes at 1080p; 4K may cause memory issues. For comparison, an RTX 4090 generates the same clip in about 3 minutes.

Ready to start your project?

Get in touch for a free consultation. I typically respond within a few hours.

Contact Me