This is the wrap-up for the series - and the part worth bookmarking, because it's the one you'll return to when scoping a new system.
The decision tree above maps every pattern from the series to the specific query characteristic that justifies it. Work top to bottom - add complexity only where you have evidence of a gap, not anticipation of one.
The decision logic, in plain language
Single-hop, FAQ-style queries? Start with Naive RAG. If eval shows precision gaps, add Advanced RAG (hybrid search + reranking) before reaching for anything more structural. Multiple data domains or mixed structured/unstructured data? Add Modular RAG routing. Answers requiring entity relationship traversal? Add Graph RAG on top. Multi-hop, self-correcting queries? Agentic RAG - with hard iteration limits, cost ceiling, and full tracing from day one.
Scale tiers
Small (under 50K documents): Naive or lightly-Advanced RAG. pgvector or a serverless Pinecone keeps infrastructure overhead minimal. ~50 golden questions for evaluation is a reasonable starting point.
Medium (50K-1M documents): Advanced RAG's hybrid search and reranking earn their cost here. Dedicated vector infrastructure - Qdrant, Weaviate, OpenSearch. Automated regression suite: RAGAS-style metrics on every change, not just at launch.
Large (1M+ documents): Modular + Graph + Agentic, applied selectively per domain based on real query patterns - not uniformly. Multi-tenant indexes with access-control-aware retrieval become a requirement.
The four eval metrics that actually matter
- Context precision - what percentage of retrieved chunks are actually relevant? Most directly improved by reranking.
- Context recall - what percentage of the information needed to answer was retrieved at all? A system can have perfect precision and terrible recall.
- Faithfulness - does the answer stay grounded in retrieved context, or does it hallucinate beyond it?
- Answer relevance - does the answer actually address what was asked? An answer can be perfectly faithful while still missing the question.
Closing thought: every pattern in this series is a response to a specific, observable gap. The discipline that separates well-architected RAG from over-engineered systems isn't knowing these patterns - it's having the evaluation infrastructure to know which gap you actually have, and adding exactly the layer that closes it.
RAG Architecture Series - 6 Parts