Founding Cohort — 150 seats

Full Curriculum

The Trail Map

Seven modules. One tiny codebase. Each checkpoint builds on the last until you can explain every major block, modify the code with confidence, and build your own variant.

MODULE 0

Trailhead

0

Orientation, prerequisites, and what a GPT is actually trying to do. Set up your environment and understand the landscape before the hike begins.

Outcome

Learner understands the project scope, has a working environment, and can articulate what next-token prediction means at a high level.

Lessons

Welcome to the Trail
What is a GPT, really?
Environment Setup
The microGPT Codebase Tour
Your First Forward Pass
Trailhead Complete
MODULE 1

Token Trail

1

Documents, BOS, vocabulary, tokenization, and next-token prediction. Understand what the model sees and what it's trying to predict.

Outcome

Learner can explain what the model is predicting and why, trace from raw text to token IDs, and describe the vocabulary.

Lessons

From Text to Tokens
Building a Vocabulary
BOS and Special Tokens
Next-Token Prediction
What Goes In, What Comes Out
Token Trail Cleared
MODULE 2

Gradient Gorge

2

Bigram intuition, loss functions, autograd, backpropagation, and learning dynamics. Where the actual learning happens.

Outcome

Learner can explain where learning happens, how parameters update, and what the loss function is measuring.

Lessons

The Bigram Baseline
Cross-Entropy Loss
Autograd Under the Hood
Backpropagation Step by Step
Learning Rate and Dynamics
Survived Gradient Gorge
MODULE 3

Attention Pass

3

Embeddings, positional information, self-attention, residual connections, MLP blocks, and layer normalization. The transformer core.

Outcome

Learner can trace a complete forward pass and explain every major component of the transformer architecture.

Lessons

Token and Position Embeddings
Self-Attention from Scratch
Multi-Head Attention
Residual Connections
MLP and Layer Norm
Reached Attention Pass
MODULE 4

Optimizer Ridge

4

Adam optimizer, training loops, logits interpretation, sampling strategies, temperature, and inference. From training to generation.

Outcome

Learner can configure training, interpret logits, and generate text with different sampling strategies.

Lessons

The Training Loop
Adam Optimizer
Interpreting Logits
Temperature and Sampling
From Training to Inference
Optimizer Ridge Conquered
MODULE 5

Summit Project

5

Modify the model, swap the dataset or architecture, and build your own tiny GPT variant. Ship something real.

Outcome

Learner ships a working variant and can defend every modification they made.

Lessons

Choosing Your Variant
Dataset Preparation
Architecture Modifications
Training Your Variant
Capstone Defense
Summit Unlocked
BONUS

Beyond the Map

6

What scales from tiny GPTs to real-world systems, and what changes in production. The bridge from learning to building.

Outcome

Learner understands the gap between microGPT and production systems, and knows where to go next.

Lessons

From Tiny to Large
What Changes at Scale
Production Considerations
Where to Go Next
I Followed the Karpath