1AI Blog

February 4, 2026 Tutorial

How to Set Your Default AI Model

Tired of model roulette? Pick your default model and keep it. GPT-4o, Claude, or any of 300+ models.

U-shaped attention pattern in long context

January 31, 2026 Sutskever 30

Lost in the Middle

Language models struggle with information in the middle of long contexts. The U-shaped attention pattern. Paper #30.

RAG architecture combining retrieval with generation

January 31, 2026 Sutskever 30

Retrieval-Augmented Generation

Combining retrieval with generation. External knowledge grounds language models in facts. Paper #29.

Semantic embeddings for questions and passages

January 31, 2026 Sutskever 30

Dense Passage Retrieval

Learning embeddings for questions and passages. Dual encoders and contrastive learning beat BM25. Paper #28.

Multiple tokens being predicted in parallel

January 31, 2026 Sutskever 30

Multi-Token Prediction

Predicting multiple tokens at once improves sample efficiency and enables speculative decoding. Paper #27.

Convolutional filters detecting features

January 31, 2026 Sutskever 30

CNN Fundamentals (CS231n)

Convolutional layers, pooling, ReLU, and backpropagation. Stanford's CS231n course distilled. Paper #26.

String compression and algorithmic complexity

January 31, 2026 Sutskever 30

Kolmogorov Complexity

The shortest program that outputs a string. Incompressibility equals randomness. Paper #25.

Intelligence emergence and recursive self-improvement

January 31, 2026 Sutskever 30

Machine Superintelligence

Formal definitions of machine intelligence. Recursive self-improvement and the path to superintelligence. Paper #24.

Data compression through model simplicity

January 31, 2026 Sutskever 30

The MDL Principle

Minimum Description Length balances model complexity against fit. Occam's razor made mathematical. Paper #23.

Power law scaling of neural network performance

January 31, 2026 Sutskever 30

Scaling Laws

Power laws for neural language models. How performance scales with compute, data, and parameters. Paper #22.

Audio waveform being transcribed through CTC

January 31, 2026 Sutskever 30

CTC Loss

Training speech models without frame-level alignment. The blank token and alignment-free learning. Paper #21.

January 31, 2026 Sutskever 30

Neural Turing Machines

Differentiable computers with external memory banks. Content and location-based addressing. Paper #20.

January 31, 2026 Sutskever 30

The Coffee Automaton

Why does coffee mix but never unmix? Entropy, coarse-graining, and the arrow of time. Paper #19.

January 31, 2026 Sutskever 30

Relational Recurrent Neural Networks

Memory slots that attend to each other enable multi-step reasoning. Paper #18.

January 31, 2026 Sutskever 30

Variational Autoencoders

Learning to generate data by encoding it into structured latent spaces. ELBO and the reparameterization trick. Paper #17.

Object pairs being compared by relation network

January 31, 2026 Sutskever 30

Relational Reasoning

Relation Networks compare all object pairs to answer questions about relationships. Paper #16.

January 31, 2026 Sutskever 30

Identity Mappings in Deep Residual Networks

Moving activation before convolution enables training 1000-layer networks. Paper #15.

Attention weights between source and target

January 31, 2026 Sutskever 30

Bahdanau Attention

Neural machine translation by jointly learning to align and translate. The original attention mechanism. Paper #14.

January 31, 2026 Sutskever 30

Attention Is All You Need

The Transformer architecture replaced recurrence with self-attention. Foundation of GPT, BERT, and modern AI. Paper #13.

January 31, 2026 Sutskever 30

Graph Neural Networks

Message passing on graphs. Nodes aggregate information from neighbors to learn representations. Paper #12.

January 31, 2026 Sutskever 30

Dilated Convolutions

Exponentially expanding receptive fields without losing resolution. WaveNet's secret for audio generation. Paper #11.

January 31, 2026 Sutskever 30

Deep Residual Learning (ResNet)

Skip connections let gradients flow through 152 layers. Learning residuals instead of direct mappings. Paper #10.

January 31, 2026 Sutskever 30

GPipe

Pipeline parallelism for training giant neural networks. Micro-batches keep accelerators busy. Paper #9.

January 31, 2026 Sutskever 30

Order Matters: Seq2Seq for Sets

Sets have no order, but neural networks need sequences. How to handle permutation invariance. Paper #8.

January 29, 2026 Opinion

GPT-5.2 Feels Like a Patronizing Hall Monitor

GPT-5.2 feels less like a tool and more like a patronizing hall monitor. When "just do the task" becomes a premium feature.

Convolutional neural network layers visualization

January 20, 2026 Sutskever 30

AlexNet

Krizhevsky, Sutskever, and Hinton's 2012 ImageNet victory proved deep learning could outperform hand-engineered computer vision. Paper #7.

Attention mechanism as pointer visualization

January 20, 2026 Sutskever 30

Pointer Networks

Vinyals, Fortunato, and Jaitly repurposed attention to point at input positions instead of blending hidden states. Paper #6.

Neural network weight compression visualization

January 20, 2026 Sutskever 30

Keeping Neural Networks Simple

Hinton and van Camp showed that penalizing weight complexity leads to better generalization. Paper #5.

Neural network dropout diagram showing selective application

January 20, 2026 Sutskever 30

Recurrent Neural Network Regularization

Zaremba, Sutskever, and Vinyals figured out how to apply dropout to LSTMs without breaking them. Paper #4.

LSTM cell state conveyor belt visualization

January 20, 2026 Sutskever 30

Understanding LSTM Networks

Christopher Olah's 2015 post explained LSTM gates with clarity that textbooks lacked. Paper #3.

Shakespeare rendered in neural network style

January 20, 2026 Sutskever 30

The Unreasonable Effectiveness of RNNs

Karpathy's famous 2015 post showed RNNs could generate Shakespeare and Linux code by predicting one character at a time. Paper #2.

Coffee mixing visualization - entropy vs complexity

January 20, 2026 Sutskever 30

Why Coffee Mixes But Never Unmixes

Scott Aaronson's First Law of Complexodynamics explains why complexity rises, peaks, then falls. Paper #1 from Sutskever's 30.

July 20, 2025

Shareable conversation links

Got tired of copy-pasting AI responses into Slack. Built shareable links instead. One good prompt becomes team knowledge.

July 10, 2025

What makes custom instructions work

Scraped 10,000 custom instructions to see patterns. Set your style once -every model remembers.

How to Set Your Default AI Model

Lost in the Middle

Retrieval-Augmented Generation

Dense Passage Retrieval

Multi-Token Prediction

CNN Fundamentals (CS231n)

Kolmogorov Complexity

Machine Superintelligence

The MDL Principle

Scaling Laws

CTC Loss

Neural Turing Machines

The Coffee Automaton

Relational Recurrent Neural Networks

Variational Autoencoders

Relational Reasoning

Identity Mappings in Deep Residual Networks

Bahdanau Attention

Attention Is All You Need

Graph Neural Networks

Dilated Convolutions

Deep Residual Learning (ResNet)

GPipe

Order Matters: Seq2Seq for Sets

GPT-5.2 Feels Like a Patronizing Hall Monitor

AlexNet

Pointer Networks

Keeping Neural Networks Simple

Recurrent Neural Network Regularization

Understanding LSTM Networks

The Unreasonable Effectiveness of RNNs

Why Coffee Mixes But Never Unmixes

Shareable conversation links

What makes custom instructions work

Updates