Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 77 additions & 57 deletions 04_2026/demo_minimal.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,61 @@
"cell_type": "markdown",
"id": "dt50gf9tfxg",
"metadata": {},
"source": "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/andyrdt/puzzles/blob/main/04_2026/demo_minimal.ipynb)\n\n# Monthly Algorithmic Challenge \u2014 April 2026: Max of List\n\n*Inspired by [Callum McDougall](https://www.callumcdougall.net/)'s [ARENA Monthly Algorithmic Challenges](https://arena-chapter1-transformer-interp.streamlit.app/Monthly_Algorithmic_Challenge).*\n\n## Overview\n\nWe've trained two small attention-only transformers to solve the same task: **given a list of 5 numbers, predict the maximum.**\n\nBoth models achieve 100% test accuracy. Your challenge: **reverse-engineer the algorithm each model has learned.**\n\n### The two models\n\n| | Model 1 (easier) | Model 2 (harder) |\n|---|---|---|\n| **Numbers** | 0\u20139 | 0\u201399 |\n| **Tokenization** | One token per number | Two digit tokens per number (e.g. 42 \u2192 `4`, `2`) |\n| **Layers** | 1 | 2 |\n| **Heads** | 4 | 4 |\n| **`d_model`** | 64 | 64 |\n| **Parameters** | 18,944 | 35,712 |\n| **Vocab** | 14 tokens (0-9 + BOS, SEP, ANS, EOS) | 14 tokens (digits 0-9 + BOS, SEP, ANS, EOS) |\n\nBoth models are trained and evaluated on lists of length 5.\n\n### Architecture\n\nBoth models are **attention-only** causal transformers \u2014 no MLPs, no LayerNorm. The architecture is:\n\n```\ntoken_embedding + positional_embedding\n \u2192 attention layer(s) with residual connections\n \u2192 linear unembed \u2192 logits\n```\n\nPositional embeddings are learned (`nn.Embedding`). Each attention layer has `n_heads` independent heads, each computing Q, K, V projections, with a shared W_O output projection. The model returns both logits and per-layer attention patterns.\n\n### What constitutes a good solution?\n\n- **Describe the mechanism** the model uses to find the max.\n- **Provide evidence** via attention pattern visualizations, ablation experiments, activation patching, direct logit attribution, or other relevant techniques.\n- **Present your solution clearly** by utilizing the markdown cells, and presenting clean figures.\n\nWe recommend starting with Model 1 \u2014 it's simple enough that you should be able to fully explain every weight matrix.",
"outputs": [],
"execution_count": null
"source": [
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/andyrdt/puzzles/blob/main/04_2026/demo_minimal.ipynb)\n",
"\n",
"# Monthly Algorithmic Challenge — April 2026: Max of List\n",
"\n",
"*Inspired by [Callum McDougall](https://www.callumcdougall.net/)'s [ARENA Monthly Algorithmic Challenges](https://arena-chapter1-transformer-interp.streamlit.app/Monthly_Algorithmic_Challenge).*\n",
"\n",
"## Overview\n",
"\n",
"We've trained two small attention-only transformers to solve the same task: **given a list of 5 numbers, predict the maximum.**\n",
"\n",
"Both models achieve 100% test accuracy. Your challenge: **reverse-engineer the algorithm each model has learned.**\n",
"\n",
"### The two models\n",
"\n",
"| | Model 1 (easier) | Model 2 (harder) |\n",
"|---|---|---|\n",
"| **Numbers** | 0–9 | 0–99 |\n",
"| **Tokenization** | One token per number | Two digit tokens per number (e.g. 42 → `4`, `2`) |\n",
"| **Layers** | 1 | 2 |\n",
"| **Heads** | 4 | 4 |\n",
"| **`d_model`** | 64 | 64 |\n",
"| **Parameters** | 18,944 | 35,712 |\n",
"| **Vocab** | 14 tokens (0-9 + BOS, SEP, ANS, EOS) | 14 tokens (digits 0-9 + BOS, SEP, ANS, EOS) |\n",
"\n",
"Both models are trained and evaluated on lists of length 5.\n",
"\n",
"### Architecture\n",
"\n",
"Both models are **attention-only** causal transformers — no MLPs, no LayerNorm. The architecture is:\n",
"\n",
"```\n",
"token_embedding + positional_embedding\n",
" → attention layer(s) with residual connections\n",
" → linear unembed → logits\n",
"```\n",
"\n",
"Positional embeddings are learned (`nn.Embedding`). Each attention layer has `n_heads` independent heads, each computing Q, K, V projections, with a shared W_O output projection. The model returns both logits and per-layer attention patterns.\n",
"\n",
"### What constitutes a good solution?\n",
"\n",
"- **Describe the mechanism** the model uses to find the max.\n",
"- **Provide evidence** via attention pattern visualizations, ablation experiments, activation patching, direct logit attribution, or other relevant techniques.\n",
"- **Present your solution clearly** by utilizing the markdown cells, and presenting clean figures.\n",
"\n",
"We recommend starting with Model 1 — it's simple enough that you should be able to fully explain every weight matrix."
]
},
{
"cell_type": "markdown",
"id": "7r1ybe7gomm",
"metadata": {},
"source": [
"## Setup"
],
"outputs": [],
"execution_count": null
]
},
{
"cell_type": "code",
Expand Down Expand Up @@ -70,9 +112,7 @@
"## Helper functions\n",
"\n",
"Tokenization helpers and a utility to visualize attention patterns. You'll use these throughout."
],
"outputs": [],
"execution_count": null
]
},
{
"cell_type": "code",
Expand All @@ -81,7 +121,7 @@
"metadata": {},
"outputs": [],
"source": [
"# \u2500\u2500 Model 1 vocab (numbers 0-9, each is its own token) \u2500\u2500\n",
"# ── Model 1 vocab (numbers 0-9, each is its own token) ──\n",
"NUM_RANGE_1 = 10\n",
"BOS_1, SEP_1, ANS_1, EOS_1 = 10, 11, 12, 13\n",
"VOCAB_SIZE_1 = 14\n",
Expand All @@ -102,7 +142,7 @@
" return [TOKEN_NAMES_1.get(t, str(t)) for t in tokens]\n",
"\n",
"\n",
"# \u2500\u2500 Model 2 vocab (digits 0-9, two per number) \u2500\u2500\n",
"# ── Model 2 vocab (digits 0-9, two per number) ──\n",
"BOS_2, SEP_2, ANS_2, EOS_2 = 10, 11, 12, 13\n",
"VOCAB_SIZE_2 = 14\n",
"TOKEN_NAMES_2 = {10: \"BOS\", 11: \"SEP\", 12: \"ANS\", 13: \"EOS\"}\n",
Expand All @@ -123,7 +163,7 @@
" return [TOKEN_NAMES_2.get(t, str(t)) for t in tokens]\n",
"\n",
"\n",
"# \u2500\u2500 Attention visualization \u2500\u2500\n",
"# ── Attention visualization ──\n",
"def _format_attn_ax(ax, attn_matrix, token_labels, title):\n",
" \"\"\"Format a single attention heatmap axis.\"\"\"\n",
" ax.imshow(attn_matrix, cmap=\"Blues\", vmin=0, vmax=1)\n",
Expand Down Expand Up @@ -177,20 +217,18 @@
"metadata": {},
"source": [
"---\n",
"# Model 1: Max of list (0\u20139), 1-layer attention-only\n",
"# Model 1: Max of list (0–9), 1-layer attention-only\n",
"\n",
"**Task**: Given 5 numbers from 0\u20139, predict the maximum.\n",
"**Task**: Given 5 numbers from 0–9, predict the maximum.\n",
"\n",
"**Input format**: `[BOS] n1 [SEP] n2 [SEP] n3 [SEP] n4 [SEP] n5 [ANS]`\n",
"\n",
"**Output**: At the `[ANS]` position, the model should predict the max value. Then it should predict `[EOS]`.\n",
"\n",
"**Example**: `[BOS] 3 [SEP] 7 [SEP] 2 [SEP] 5 [SEP] 1 [ANS]` \u2192 model predicts `7`\n",
"**Example**: `[BOS] 3 [SEP] 7 [SEP] 2 [SEP] 5 [SEP] 1 [ANS]` model predicts `7`\n",
"\n",
"This model has a single attention layer with 4 heads and no MLPs. The entire computation is: embed \u2192 one multi-head attention layer (with residual) \u2192 unembed."
],
"outputs": [],
"execution_count": null
"This model has a single attention layer with 4 heads and no MLPs. The entire computation is: embed → one multi-head attention layer (with residual) → unembed."
]
},
{
"cell_type": "code",
Expand Down Expand Up @@ -218,9 +256,7 @@
"### Verifying Model 1 works\n",
"\n",
"Let's run a few examples and check the model gets them right. We show the full output distribution at the ANS position."
],
"outputs": [],
"execution_count": null
]
},
{
"cell_type": "code",
Expand All @@ -242,10 +278,7 @@
" tokens = tokenize_1(nums)\n",
" x = torch.tensor([tokens], device=device)\n",
"\n",
" # NOTE: nnsight requires accessing modules in forward execution order\n",
" # (layers before unembed)\n",
" with model_1.trace(x):\n",
" logits = model_1.unembed.output.save()\n",
" logits = model_1(x)[0]\n",
"\n",
" # Softmax over number tokens at the ANS position (last token)\n",
" probs = torch.softmax(logits[0, -1, :NUM_RANGE_1], dim=-1).detach().cpu()\n",
Expand Down Expand Up @@ -275,9 +308,7 @@
"### Example: attention patterns for Model 1\n",
"\n",
"Here's what the 4 attention heads look like on a single input. The ANS row (bottom) is where the model reads from the input to make its prediction."
],
"outputs": [],
"execution_count": null
]
},
{
"cell_type": "code",
Expand Down Expand Up @@ -311,32 +342,28 @@
"Your goal is to try and understand how the model works.\n",
"\n",
"Good luck!"
],
"outputs": [],
"execution_count": null
]
},
{
"cell_type": "markdown",
"id": "2v1h3znl2op",
"metadata": {},
"source": [
"---\n",
"# Model 2: Max of list (0\u201399), 2-layer attention-only, digit tokenization\n",
"# Model 2: Max of list (0–99), 2-layer attention-only, digit tokenization\n",
"\n",
"**Task**: Given 5 numbers from 0\u201399, predict the maximum.\n",
"**Task**: Given 5 numbers from 0–99, predict the maximum.\n",
"\n",
"**Tokenization**: Each number is split into two digit tokens (tens, ones), always zero-padded. So `42` \u2192 tokens `4`, `2` and `7` \u2192 tokens `0`, `7`.\n",
"**Tokenization**: Each number is split into two digit tokens (tens, ones), always zero-padded. So `42` tokens `4`, `2` and `7` tokens `0`, `7`.\n",
"\n",
"**Input format**: `[BOS] d1t d1o [SEP] d2t d2o [SEP] ... d5t d5o [ANS]`\n",
"\n",
"**Output**: At `[ANS]`, model predicts the tens digit of the max. Then the ones digit. Then `[EOS]`.\n",
"\n",
"**Example**: `[BOS] 4 2 [SEP] 1 7 [SEP] 8 5 [SEP] 0 3 [SEP] 6 1 [ANS]` \u2192 model predicts `8` then `5`\n",
"**Example**: `[BOS] 4 2 [SEP] 1 7 [SEP] 8 5 [SEP] 0 3 [SEP] 6 1 [ANS]` model predicts `8` then `5`\n",
"\n",
"This model has **2 attention layers** with 4 heads each. The interesting question is how the layers divide the work \u2014 a 1-layer model can learn the tens digit (100% accuracy) but plateaus at ~40% for the ones digit."
],
"outputs": [],
"execution_count": null
"This model has **2 attention layers** with 4 heads each. The interesting question is how the layers divide the work — a 1-layer model can learn the tens digit (100% accuracy) but plateaus at ~40% for the ones digit."
]
},
{
"cell_type": "code",
Expand Down Expand Up @@ -364,9 +391,7 @@
"### Verifying Model 2 works\n",
"\n",
"For Model 2, the output is two tokens: tens digit then ones digit. We feed the input up to `[ANS]`, get the tens prediction, then feed that back to get the ones prediction."
],
"outputs": [],
"execution_count": null
]
},
{
"cell_type": "code",
Expand All @@ -387,21 +412,19 @@
" x = torch.tensor([tokens], device=device)\n",
"\n",
" # Get tens digit prediction (at ANS position)\n",
" with model_2.trace(x):\n",
" logits_tens = model_2.unembed.output.save()\n",
" logits_tens = model_2(x)[0]\n",
" pred_tens = logits_tens[0, -1, :10].argmax().item()\n",
"\n",
" # Feed tens digit back, get ones digit prediction\n",
" tokens_ext = tokens + [pred_tens]\n",
" x_ext = torch.tensor([tokens_ext], device=device)\n",
" with model_2.trace(x_ext):\n",
" logits_ones = model_2.unembed.output.save()\n",
" logits_ones = model_2(x)[0]\n",
" pred_ones = logits_ones[0, -1, :10].argmax().item()\n",
"\n",
" pred_num = pred_tens * 10 + pred_ones\n",
" true_max = max(nums)\n",
" status = \"correct\" if pred_num == true_max else \"WRONG\"\n",
" print(f\" {nums} \u2192 predicted {pred_num:2d}, true max {true_max:2d} [{status}]\")"
" print(f\" {nums} predicted {pred_num:2d}, true max {true_max:2d} [{status}]\")"
]
},
{
Expand All @@ -412,9 +435,7 @@
"### Example: attention patterns for Model 2\n",
"\n",
"With 2 layers, we can see how the model builds up its computation across layers."
],
"outputs": [],
"execution_count": null
]
},
{
"cell_type": "code",
Expand All @@ -427,6 +448,7 @@
"tokens = tokenize_2(nums)\n",
"x = torch.tensor([tokens], device=device)\n",
"\n",
"# NOTE: nnsight requires accessing modules in forward execution order\n",
"with model_2.trace(x):\n",
" l0_out = model_2.layers[0].output.save()\n",
" l1_out = model_2.layers[1].output.save()\n",
Expand All @@ -452,14 +474,12 @@
"Your goal is to try and understand how the model works.\n",
"\n",
"Good luck!"
],
"outputs": [],
"execution_count": null
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"display_name": "test",
"language": "python",
"name": "python3"
},
Expand All @@ -473,9 +493,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.14"
"version": "3.12.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
}