Seven modules. One tiny codebase. Each checkpoint builds on the last until you can explain every major block, modify the code with confidence, and build your own variant.
Orientation, prerequisites, and what a GPT is actually trying to do. Set up your environment and understand the landscape before the hike begins.
Outcome
Learner understands the project scope, has a working environment, and can articulate what next-token prediction means at a high level.
Lessons
Documents, BOS, vocabulary, tokenization, and next-token prediction. Understand what the model sees and what it's trying to predict.
Outcome
Learner can explain what the model is predicting and why, trace from raw text to token IDs, and describe the vocabulary.
Lessons
Bigram intuition, loss functions, autograd, backpropagation, and learning dynamics. Where the actual learning happens.
Outcome
Learner can explain where learning happens, how parameters update, and what the loss function is measuring.
Lessons
Embeddings, positional information, self-attention, residual connections, MLP blocks, and layer normalization. The transformer core.
Outcome
Learner can trace a complete forward pass and explain every major component of the transformer architecture.
Lessons
Adam optimizer, training loops, logits interpretation, sampling strategies, temperature, and inference. From training to generation.
Outcome
Learner can configure training, interpret logits, and generate text with different sampling strategies.
Lessons
Modify the model, swap the dataset or architecture, and build your own tiny GPT variant. Ship something real.
Outcome
Learner ships a working variant and can defend every modification they made.
Lessons
What scales from tiny GPTs to real-world systems, and what changes in production. The bridge from learning to building.
Outcome
Learner understands the gap between microGPT and production systems, and knows where to go next.
Lessons