Lost in the Middle

Put relevant information at the beginning of context. Put it at the end. Models find it. Put it in the middle, and performance drops dramatically. This "lost in the middle" phenomenon reveals fundamental limitations in how language models process long contexts.

U-shaped attention pattern in long context — Performance follows a U-shape: strong at beginning and end, weak in the middle. The pattern persists across models and context lengths.

Why Sutskever Included This

Longer context windows are marketed as major advances. But can models actually use all that context? This paper reveals they often can't. Understanding the limitation is essential for designing effective RAG systems and long-context applications.

The U-Shaped Pattern

When relevant information appears at different positions in context:

Beginning: ~90% accuracy
Middle: ~60% accuracy
End: ~85% accuracy

The 30-point drop for middle positions represents substantial degradation. Information in the middle might as well not be there.

Why It Happens

Several factors contribute:

Attention mechanics: Self-attention naturally prioritizes recent tokens (recency) and early tokens (primacy). Middle positions compete for limited attention.

Training distribution: Most training contexts are short. Models see few examples of genuinely long-range dependencies.

Causal masking: Decoder-only models process left-to-right. Information from the middle may be overwritten by subsequent tokens before it's needed.

The Problem Gets Worse

As context length increases, the middle penalty grows. At 10 documents, middle accuracy drops ~30%. At 30 documents, it drops ~50%. Longer context doesn't mean better retrieval if the model can't use it.

Implications for RAG

Retrieval-augmented generation suffers directly from this limitation. If the most relevant document ends up in the middle of context, the model may ignore it. Retrieval ranking alone doesn't solve the problem.

Practical solutions:

Reorder documents strategically: put most relevant information at beginning and end.

Limit context length when possible.

Use multiple calls with different document orderings.

Not Just a Model Problem

The pattern appears across different architectures and model sizes. It's not a specific bug but a systematic limitation of current approaches to long-context processing.

More in This Series

Part of a series on Ilya Sutskever's recommended 30 papers.