In LLM systems, latency is split into two numbers worth tracking separately. Time to first token (TTFT) is how long until the user sees anything — usually 200ms to 2s on hosted APIs. Time per output token is how fast the rest streams in.
Things that hurt latency: long prompts (more to process), reasoning models (extra hidden tokens), tool calls (each round-trip), and cold cache misses on prompt caching. Things that help: streaming, prompt caching, smaller models for simple tasks, and parallelizing tool calls when possible.
Bring this to your business
Knowing the term is one thing. Shipping it is another.
We do two-week AI Sprints — one term, one workflow, into production by Day 10.