RAG - Retrieval Augmented Generation

Language models hallucinate. They generate plausible-sounding text without checking facts. RAG fixes this by retrieving relevant documents and conditioning generation on them. External knowledge meets powerful generation.

RAG architecture combining retrieval with generation — RAG retrieves relevant documents, then generates answers grounded in that evidence. The retriever provides facts; the generator provides fluency.

Why Sutskever Included This

Pure parametric models encode knowledge in weights, fixed at training time and prone to hallucination. RAG augments generation with retrieval, enabling dynamic access to external knowledge. This architecture powers modern question-answering and knowledge-intensive applications.

The Architecture

RAG combines two components:

Retriever: Dense Passage Retrieval (DPR) finds relevant documents from a corpus based on the input query.

Generator: A seq2seq model (like BART) produces answers conditioned on the query plus retrieved documents.

P(y|x) = Σ P(z|x) · P(y|x,z)

The probability of answer y given query x marginalizes over retrieved documents z. Each document contributes to the final answer weighted by its relevance.

Two Variants

RAG-Sequence: Each retrieved document independently generates a complete answer. Final output combines these weighted by retrieval scores. Best for single-fact questions.

RAG-Token: Different documents can influence different output tokens. The generator can synthesize information from multiple sources within a single response. Better for complex questions requiring multiple facts.

Advantages

Factual grounding: Retrieved documents provide evidence. The model generates from established facts rather than memorized patterns.

Interpretability: Retrieved documents show where the answer came from. Users can verify sources.

Updatability: Knowledge base changes don't require retraining. Update the corpus; retrieval adapts automatically.

Efficiency: Smaller models achieve strong performance by offloading knowledge to retrieval.

Limitations

Retrieval errors propagate to generation. If wrong documents are retrieved, the answer will be wrong. Retrieval also adds latency compared to pure generation.

The "Lost in the Middle" problem (Paper #30) affects RAG: models struggle to use information from middle positions of retrieved context.

More in This Series

Part of a series on Ilya Sutskever's recommended 30 papers.

Retrieval-Augmented Generation