LSTMs compress everything into a single hidden state. For tasks requiring comparison between stored items, this bottleneck limits reasoning. Relational RNNs maintain multiple memory slots that interact through attention, enabling explicit relational reasoning within a recurrent framework.
Why Sutskever Included This
This paper bridges recurrent networks and attention mechanisms. Rather than choosing between LSTM memory and transformer-style attention, it combines both. The architecture influenced subsequent work on memory-augmented networks and working memory in neural systems.
The Single-State Bottleneck
Standard LSTMs maintain one hidden state vector. Everything the network remembers must fit there. When a task requires comparing items seen at different timesteps, the network must encode all relevant information and the comparison logic into that single vector.
This works for simple patterns but struggles with relational reasoning: "Is item A larger than item B?" requires maintaining both items distinctly and comparing them.
Multiple Memory Slots
Relational RNNs maintain N separate memory slots instead of one hidden state. Each slot is a vector that persists across timesteps. The slots can store different pieces of information independently.
When new input arrives, it's appended to the memory. Then all slots interact through multi-head self-attention. Each slot can attend to all others, comparing and updating based on relationships.
The Update Process
1. Augment: Append input to memory slots
2. Attend: Multi-head self-attention across slots
3. Process: MLP with residual connection
4. Gate: LSTM-style gating for updates
The attention step is key. Unlike standard RNNs where memory interacts only through the recurrence, here slots interact directly. Different attention heads can learn different types of relationships.
Gated Updates
The architecture uses LSTM-style gates (input, forget, output) to control memory updates. This prevents catastrophic overwriting and allows selective retention of important information.
The combination provides both: attention for relational reasoning and gating for stable long-term memory.
Tasks and Results
The paper tests on tasks requiring multi-step reasoning: sorting sequences, solving algorithmic problems, and language modeling. Relational RNNs outperform LSTMs on tasks with clear relational structure.
The improvement is largest when the task explicitly requires comparing stored items. For simple sequence prediction, the benefit is smaller.
Connection to Transformers
Transformers use self-attention without recurrence. Relational RNNs use self-attention within recurrence. The approaches are complementary: transformers parallelize better; relational RNNs may handle very long sequences with bounded memory.
Further Reading
More in This Series
Part of a series on Ilya Sutskever's recommended 30 papers.