RAG Architecture Decision Tree: A Practical Guide From Naive to Agentic

The most useful question when choosing a RAG architecture is not "which pattern is the most advanced?" It is "what is my current system unable to do reliably?"

Each pattern in this series addresses a different, observable failure. The decision guide below maps each failure to the capability that addresses it. Add complexity only where the evidence shows a gap, not in anticipation of one.

Click to enlarge

The decision framework

What you observe	What to consider
Direct questions over one focused corpus	Basic single-pass RAG
Exact terms, names, or identifiers are missed	Hybrid retrieval
Correct evidence ranks too low	Reranking
User questions are ambiguous or compound	Query rewriting or decomposition
Different questions require different systems	Routing
Answers depend on entity relationships	Graph retrieval
One retrieval step depends on a previous result	Iterative or agentic retrieval
Evidence is weak or conflicting	Verification, abstention, or human review

Corpus size affects infrastructure choice - vector database, index partitioning, embedding strategy - but it does not determine the RAG pattern by itself. A large corpus with simple, direct questions may still be best served by basic retrieval. Measure the query type, not the document count.

The five questions

Can the system find the evidence? Start with basic retrieval. Add hybrid search, query transformation, or reranking when retrieval metrics expose a gap. This is the foundation - every other question assumes retrieval is already working.

Does it know where to look? Add routing when questions require different domains, databases, APIs, or retrieval strategies. A single retrieval pipeline cannot serve fundamentally different question types with equal effectiveness.

Does it understand how information connects? Add graph-based retrieval when relationships between entities are part of the answer. This is distinct from knowing where to look - it is about traversing connections within the evidence.

Does it need to search more than once? Add iterative or agentic retrieval when each retrieval step depends on what the previous step discovered. This is the highest-cost pattern and should be the last to be added.

Can you prove the answer is supported? Evaluate retrieval quality, answer grounding, citations, abstention accuracy, latency, cost, security, and user-task success. Without measurement, you cannot know which question applies.

What to measure

Four quality questions are sufficient to start. Did you retrieve the right evidence - does the expected source or passage appear in the candidate results? Did you keep the best evidence - do relevant passages rank highly and survive context construction? Did the answer stay faithful to the evidence - are its claims supported by the provided context? Did the answer solve the user's task - a grounded answer can still be incomplete, unclear, or unhelpful.

Then add production measures as the system matures: citation accuracy, no-answer and abstention accuracy, retrieval and end-to-end latency, cost per query, index freshness, authorization failures, tool-call success rate, and user-task completion.

The architecture principle: a more advanced pattern is not automatically a better architecture. Every added capability creates a trade-off - query rewriting can change intent, reranking adds latency, routing can send a question to the wrong source, agents can loop or choose unsafe actions. The right architecture is the smallest one that meets the required level of accuracy, security, traceability, and reliability. Start with the core loop. Find the measurable gap. Add only the capability that closes it.

RAG Architecture Series - 6 Parts

← PreviousAgentic RAG

RAG Architecture Decision Tree: A Practical Guide From Naive to Agentic

The decision framework

The five questions

What to measure

Stay sharp on AI engineering

Let's Connect