Chunking is the process of splitting source documents into smaller segments before embedding them for RAG. You can't embed a 50-page PDF as one unit — retrieval would return the whole document regardless of what the user actually needs. Chunks let you retrieve the specific paragraph that answers the question.
The key variables: size (typically 256–1024 tokens) and overlap (how much adjacent chunks share — usually 10-20%). Bigger chunks preserve more context but reduce precision; smaller chunks are precise but may split ideas mid-sentence. Overlap prevents clean cuts from losing context.
The strategies that matter most: recursive text splitting (split on paragraphs, then sentences, then characters, in that order), semantic chunking (split on meaning breaks rather than fixed size), and document-structure-aware chunking (respect headings, tables, and sections). Most teams underinvest in chunking strategy and overfocus on model choice — it's usually the bigger lever.
Bring this to your business
Knowing the term is one thing. Shipping it is another.
We do two-week AI Sprints — one term, one workflow, into production by Day 10.