Just Think AIStart thinking

GlossaryTerm

Distillation

Training a smaller, cheaper model to mimic a larger one's outputs.

Distillation is when you use a large, expensive "teacher" model to generate training data, then fine-tune a smaller, cheaper "student" model on that data. The student doesn't match the teacher on everything, but on the narrow task it was distilled for it often comes within a few percentage points — at 10× lower cost and 5× lower latency.

Where it pays off: high-volume, narrow tasks (classification, routing, extraction, simple summarization). Where it doesn't: open-ended reasoning, anything you'd reach for a frontier model to do. Most teams should start with prompt engineering on a small model before considering distillation — it's a power tool, not a starter project.

Bring this to your business

Knowing the term is one thing. Shipping it is another.

We do two-week AI Sprints — one term, one workflow, into production by Day 10.