Your complete trail map. Complete each module in order.
Orientation, prerequisites, and what a GPT is actually trying to do. Set up your environment and understand the landscape before the hike begins.
Documents, BOS, vocabulary, tokenization, and next-token prediction. Understand what the model sees and what it's trying to predict.
Bigram intuition, loss functions, autograd, backpropagation, and learning dynamics. Where the actual learning happens.
Embeddings, positional information, self-attention, residual connections, MLP blocks, and layer normalization. The transformer core.
Adam optimizer, training loops, logits interpretation, sampling strategies, temperature, and inference. From training to generation.
Modify the model, swap the dataset or architecture, and build your own tiny GPT variant. Ship something real.
What scales from tiny GPTs to real-world systems, and what changes in production. The bridge from learning to building.