Just Think AIStart thinking

GlossaryTerm

Prompt Caching

Reusing cached computation for repeated prompt prefixes — cuts cost 80-90%.

Prompt caching lets you designate a stable prefix of your prompt (system prompt, static context, documents) as cacheable. The provider processes it once and caches the key-value activations. Subsequent requests that share the same prefix skip that computation and pay 10-25% of the normal input token price.

Anthropic's cache_control feature and OpenAI's automatic prefix caching both do this. The savings are real: a 10,000-token system prompt with shared context, sent on every request, drops from $0.025 per call to $0.003. At 1 million daily calls that's $22,000/month vs. $3,000/month — from one config change.

The rule: structure your prompt so the stable parts (system prompt, reference docs, few-shot examples) come before the variable parts (user message). Cache breaks on any change to the prefix.

Bring this to your business

Knowing the term is one thing. Shipping it is another.

We do two-week AI Sprints — one term, one workflow, into production by Day 10.