Put relevant information at the beginning of context. Put it at the end. Models find it. Put it in the middle, and performance drops dramatically. This "lost in the middle" phenomenon reveals fundamental limitations in how language models process long contexts.
Why Sutskever Included This
Longer context windows are marketed as major advances. But can models actually use all that context? This paper reveals they often can't. Understanding the limitation is essential for designing effective RAG systems and long-context applications.
The U-Shaped Pattern
When relevant information appears at different positions in context:
Beginning: ~90% accuracy
Middle: ~60% accuracy
End: ~85% accuracy
The 30-point drop for middle positions represents substantial degradation. Information in the middle might as well not be there.
Why It Happens
Several factors contribute:
Attention mechanics: Self-attention naturally prioritizes recent tokens (recency) and early tokens (primacy). Middle positions compete for limited attention.
Training distribution: Most training contexts are short. Models see few examples of genuinely long-range dependencies.
Causal masking: Decoder-only models process left-to-right. Information from the middle may be overwritten by subsequent tokens before it's needed.
The Problem Gets Worse
As context length increases, the middle penalty grows. At 10 documents, middle accuracy drops ~30%. At 30 documents, it drops ~50%. Longer context doesn't mean better retrieval if the model can't use it.
Implications for RAG
Retrieval-augmented generation suffers directly from this limitation. If the most relevant document ends up in the middle of context, the model may ignore it. Retrieval ranking alone doesn't solve the problem.
Practical solutions:
Reorder documents strategically: put most relevant information at beginning and end.
Limit context length when possible.
Use multiple calls with different document orderings.
Not Just a Model Problem
The pattern appears across different architectures and model sizes. It's not a specific bug but a systematic limitation of current approaches to long-context processing.
Further Reading
More in This Series
Part of a series on Ilya Sutskever's recommended 30 papers.