Founding Cohort — 150 seats

Full Curriculum

The Trail Map

Seven modules. One tiny codebase. Each checkpoint builds on the last until you can explain every major block, modify the code with confidence, and build your own variant.

MODULE 0

Trailhead

Orientation, prerequisites, and what a GPT is actually trying to do. Set up your environment and understand the landscape before the hike begins.

Outcome

Learner understands the project scope, has a working environment, and can articulate what next-token prediction means at a high level.

Lessons

Welcome to the Trail

What is a GPT, really?

Environment Setup

The microGPT Codebase Tour

Your First Forward Pass

Trailhead Complete

MODULE 1

Token Trail

Documents, BOS, vocabulary, tokenization, and next-token prediction. Understand what the model sees and what it's trying to predict.

Outcome

Learner can explain what the model is predicting and why, trace from raw text to token IDs, and describe the vocabulary.

Lessons

From Text to Tokens

Building a Vocabulary

BOS and Special Tokens

Next-Token Prediction

What Goes In, What Comes Out

Token Trail Cleared

MODULE 2

Gradient Gorge

Bigram intuition, loss functions, autograd, backpropagation, and learning dynamics. Where the actual learning happens.

Outcome

Learner can explain where learning happens, how parameters update, and what the loss function is measuring.

Lessons

The Bigram Baseline

Cross-Entropy Loss

Autograd Under the Hood

Backpropagation Step by Step

Learning Rate and Dynamics

Survived Gradient Gorge

MODULE 3

Attention Pass

Embeddings, positional information, self-attention, residual connections, MLP blocks, and layer normalization. The transformer core.

Outcome

Learner can trace a complete forward pass and explain every major component of the transformer architecture.

Lessons

Token and Position Embeddings

Self-Attention from Scratch

Multi-Head Attention

Residual Connections

MLP and Layer Norm

Reached Attention Pass

MODULE 4

Optimizer Ridge

Adam optimizer, training loops, logits interpretation, sampling strategies, temperature, and inference. From training to generation.

Outcome

Learner can configure training, interpret logits, and generate text with different sampling strategies.

Lessons

The Training Loop

Adam Optimizer

Interpreting Logits

Temperature and Sampling

From Training to Inference

Optimizer Ridge Conquered

MODULE 5

Summit Project

Modify the model, swap the dataset or architecture, and build your own tiny GPT variant. Ship something real.

Outcome

Learner ships a working variant and can defend every modification they made.

Lessons

Choosing Your Variant

Dataset Preparation

Architecture Modifications

Training Your Variant

Capstone Defense

Summit Unlocked

BONUS

Beyond the Map

What scales from tiny GPTs to real-world systems, and what changes in production. The bridge from learning to building.

Outcome

Learner understands the gap between microGPT and production systems, and knows where to go next.

Lessons

From Tiny to Large

What Changes at Scale

Production Considerations

Where to Go Next

I Followed the Karpath