LoRA is a fine-tuning technique that freezes the base model and trains tiny "adapter" matrices alongside it. The adapter weights are 100-1,000× smaller than the full model, so training is faster, cheaper, and produces a tiny artifact (often <100MB) you can swap in and out.
Why it matters in production: you can have one base model and dozens of LoRA adapters for different tenants or use cases. Loading a LoRA at inference time costs almost nothing. It's the standard approach for nearly all open-source fine-tuning today (QLoRA combines it with quantization for even more savings).
Bring this to your business
Knowing the term is one thing. Shipping it is another.
We do two-week AI Sprints — one term, one workflow, into production by Day 10.