Streaming means the model sends tokens to your client as it generates them rather than completing the full response first. This dramatically improves perceived latency — for a 500-token response at 50 tokens/second, without streaming users wait 10 seconds to see anything; with streaming they see the first word in 200ms.
All major API providers support streaming (OpenAI's stream: true, Anthropic's streaming events, etc.). Most frontend frameworks have off-the-shelf support. The implementation work is minimal; the UX improvement is significant.
Edge cases to handle: connection drops mid-stream (implement retry with position tracking if determinism matters), cost tracking (you're billed on completion, but you need to track streamed tokens for usage displays), and progressive rendering (don't try to parse JSON from a stream — buffer until complete or use a streaming-aware JSON parser).
Bring this to your business
Knowing the term is one thing. Shipping it is another.
We do two-week AI Sprints — one term, one workflow, into production by Day 10.