1AI Blog

Notes on building Chat Labs 1AI. Getting 300+ AI models to work together. What breaks, what works, what I learned.

Sutskever 30
1AI model selection screen
February 4, 2026 Tutorial

How to Set Your Default AI Model

Tired of model roulette? Pick your default model and keep it. GPT-4o, Claude, or any of 300+ models.

Read more
U-shaped attention pattern in long context
January 31, 2026 Sutskever 30

Lost in the Middle

Language models struggle with information in the middle of long contexts. The U-shaped attention pattern. Paper #30.

Read more
RAG architecture combining retrieval with generation
January 31, 2026 Sutskever 30

Retrieval-Augmented Generation

Combining retrieval with generation. External knowledge grounds language models in facts. Paper #29.

Read more
Semantic embeddings for questions and passages
January 31, 2026 Sutskever 30

Dense Passage Retrieval

Learning embeddings for questions and passages. Dual encoders and contrastive learning beat BM25. Paper #28.

Read more
Multiple tokens being predicted in parallel
January 31, 2026 Sutskever 30

Multi-Token Prediction

Predicting multiple tokens at once improves sample efficiency and enables speculative decoding. Paper #27.

Read more
Convolutional filters detecting features
January 31, 2026 Sutskever 30

CNN Fundamentals (CS231n)

Convolutional layers, pooling, ReLU, and backpropagation. Stanford's CS231n course distilled. Paper #26.

Read more
String compression and algorithmic complexity
January 31, 2026 Sutskever 30

Kolmogorov Complexity

The shortest program that outputs a string. Incompressibility equals randomness. Paper #25.

Read more
Intelligence emergence and recursive self-improvement
January 31, 2026 Sutskever 30

Machine Superintelligence

Formal definitions of machine intelligence. Recursive self-improvement and the path to superintelligence. Paper #24.

Read more
Data compression through model simplicity
January 31, 2026 Sutskever 30

The MDL Principle

Minimum Description Length balances model complexity against fit. Occam's razor made mathematical. Paper #23.

Read more
Power law scaling of neural network performance
January 31, 2026 Sutskever 30

Scaling Laws

Power laws for neural language models. How performance scales with compute, data, and parameters. Paper #22.

Read more
Audio waveform being transcribed through CTC
January 31, 2026 Sutskever 30

CTC Loss

Training speech models without frame-level alignment. The blank token and alignment-free learning. Paper #21.

Read more
Neural Turing Machine with memory bank
January 31, 2026 Sutskever 30

Neural Turing Machines

Differentiable computers with external memory banks. Content and location-based addressing. Paper #20.

Read more
Cream diffusing into coffee
January 31, 2026 Sutskever 30

The Coffee Automaton

Why does coffee mix but never unmix? Entropy, coarse-graining, and the arrow of time. Paper #19.

Read more
Relational memory with multiple slots
January 31, 2026 Sutskever 30

Relational Recurrent Neural Networks

Memory slots that attend to each other enable multi-step reasoning. Paper #18.

Read more
VAE latent space visualization
January 31, 2026 Sutskever 30

Variational Autoencoders

Learning to generate data by encoding it into structured latent spaces. ELBO and the reparameterization trick. Paper #17.

Read more
Object pairs being compared by relation network
January 31, 2026 Sutskever 30

Relational Reasoning

Relation Networks compare all object pairs to answer questions about relationships. Paper #16.

Read more
Pre-activation ResNet block diagram
January 31, 2026 Sutskever 30

Identity Mappings in Deep Residual Networks

Moving activation before convolution enables training 1000-layer networks. Paper #15.

Read more
Attention weights between source and target
January 31, 2026 Sutskever 30

Bahdanau Attention

Neural machine translation by jointly learning to align and translate. The original attention mechanism. Paper #14.

Read more
Self-attention mechanism visualization
January 31, 2026 Sutskever 30

Attention Is All You Need

The Transformer architecture replaced recurrence with self-attention. Foundation of GPT, BERT, and modern AI. Paper #13.

Read more
Message passing between graph nodes
January 31, 2026 Sutskever 30

Graph Neural Networks

Message passing on graphs. Nodes aggregate information from neighbors to learn representations. Paper #12.

Read more
Dilated convolutions expanding receptive field
January 31, 2026 Sutskever 30

Dilated Convolutions

Exponentially expanding receptive fields without losing resolution. WaveNet's secret for audio generation. Paper #11.

Read more
ResNet skip connection diagram
January 31, 2026 Sutskever 30

Deep Residual Learning (ResNet)

Skip connections let gradients flow through 152 layers. Learning residuals instead of direct mappings. Paper #10.

Read more
GPipe pipeline parallelism visualization
January 31, 2026 Sutskever 30

GPipe

Pipeline parallelism for training giant neural networks. Micro-batches keep accelerators busy. Paper #9.

Read more
Processing sets with attention
January 31, 2026 Sutskever 30

Order Matters: Seq2Seq for Sets

Sets have no order, but neural networks need sequences. How to handle permutation invariance. Paper #8.

Read more
January 29, 2026 Opinion

GPT-5.2 Feels Like a Patronizing Hall Monitor

GPT-5.2 feels less like a tool and more like a patronizing hall monitor. When "just do the task" becomes a premium feature.

Read more
Convolutional neural network layers visualization
January 20, 2026 Sutskever 30

AlexNet

Krizhevsky, Sutskever, and Hinton's 2012 ImageNet victory proved deep learning could outperform hand-engineered computer vision. Paper #7.

Read more
Attention mechanism as pointer visualization
January 20, 2026 Sutskever 30

Pointer Networks

Vinyals, Fortunato, and Jaitly repurposed attention to point at input positions instead of blending hidden states. Paper #6.

Read more
Neural network weight compression visualization
January 20, 2026 Sutskever 30

Keeping Neural Networks Simple

Hinton and van Camp showed that penalizing weight complexity leads to better generalization. Paper #5.

Read more
Neural network dropout diagram showing selective application
January 20, 2026 Sutskever 30

Recurrent Neural Network Regularization

Zaremba, Sutskever, and Vinyals figured out how to apply dropout to LSTMs without breaking them. Paper #4.

Read more
LSTM cell state conveyor belt visualization
January 20, 2026 Sutskever 30

Understanding LSTM Networks

Christopher Olah's 2015 post explained LSTM gates with clarity that textbooks lacked. Paper #3.

Read more
Shakespeare rendered in neural network style
January 20, 2026 Sutskever 30

The Unreasonable Effectiveness of RNNs

Karpathy's famous 2015 post showed RNNs could generate Shakespeare and Linux code by predicting one character at a time. Paper #2.

Read more
Coffee mixing visualization - entropy vs complexity
January 20, 2026 Sutskever 30

Why Coffee Mixes But Never Unmixes

Scott Aaronson's First Law of Complexodynamics explains why complexity rises, peaks, then falls. Paper #1 from Sutskever's 30.

Read more
AI productivity tips
July 20, 2025

Shareable conversation links

Got tired of copy-pasting AI responses into Slack. Built shareable links instead. One good prompt becomes team knowledge.

Read more
Custom instructions guide
July 10, 2025

What makes custom instructions work

Scraped 10,000 custom instructions to see patterns. Set your style once -every model remembers.

Read more

Updates

I write when I ship something or figure something out. Maybe twice a month. No tracking, no growth hacks.

One-click unsubscribe.