Just Think AIStart thinking

GlossaryTerm

Latency

How long a model takes to respond. Measured as time-to-first-token and total time.

In LLM systems, latency is split into two numbers worth tracking separately. Time to first token (TTFT) is how long until the user sees anything — usually 200ms to 2s on hosted APIs. Time per output token is how fast the rest streams in.

Things that hurt latency: long prompts (more to process), reasoning models (extra hidden tokens), tool calls (each round-trip), and cold cache misses on prompt caching. Things that help: streaming, prompt caching, smaller models for simple tasks, and parallelizing tool calls when possible.

Bring this to your business

Knowing the term is one thing. Shipping it is another.

We do two-week AI Sprints — one term, one workflow, into production by Day 10.