Context window is the maximum number of tokens — input plus output — a model can process in a single request. GPT-4o is 128K. Claude Sonnet 4 is 200K. Gemini 2.5 Pro is 2M. Llama 3.1 is 128K.
Bigger context windows are useful but not magical. Two common gotchas: (1) Lost in the middle — models pay less attention to the middle of a long context, so structuring your prompt matters. (2) Cost — every token in the context costs money on every request. A 100K-token prompt run a million times per month adds up fast. Caching helps a lot here; check whether your provider supports prompt caching and whether the static parts of your prompt come first.
Bring this to your business
Knowing the term is one thing. Shipping it is another.
We do two-week AI Sprints — one term, one workflow, into production by Day 10.