Faithfulness is the eval dimension that asks: did the model accurately represent what the source said, or did it add, omit, or distort key information? It's closely related to groundedness but applies more broadly — even in summarization without retrieval, a summary can be unfaithful by subtly changing the meaning.
Classic faithfulness failures: changing a claim from "up to 50%" to "50%," summarizing a tentative recommendation as a firm conclusion, omitting a critical caveat, or representing a minority view as consensus. These feel subtle in demos but matter enormously in legal, medical, and financial contexts.
Measure faithfulness by having an LLM-as-judge check each claim in the output against the source and flag unsupported or distorted claims. SummEval and RAGAS are evaluation frameworks that include faithfulness as a dimension.
Bring this to your business
Knowing the term is one thing. Shipping it is another.
We do two-week AI Sprints — one term, one workflow, into production by Day 10.