andyrdt · AdamBelfki3 · Mar 31, 2026
diff --git a/04_2026/demo_minimal.ipynb b/04_2026/demo_minimal.ipynb
@@ -4,19 +4,61 @@
    "cell_type": "markdown",
    "id": "dt50gf9tfxg",
    "metadata": {},
-   "source": "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/andyrdt/puzzles/blob/main/04_2026/demo_minimal.ipynb)\n\n# Monthly Algorithmic Challenge \u2014 April 2026: Max of List\n\n*Inspired by [Callum McDougall](https://www.callumcdougall.net/)'s [ARENA Monthly Algorithmic Challenges](https://arena-chapter1-transformer-interp.streamlit.app/Monthly_Algorithmic_Challenge).*\n\n## Overview\n\nWe've trained two small attention-only transformers to solve the same task: **given a list of 5 numbers, predict the maximum.**\n\nBoth models achieve 100% test accuracy. Your challenge: **reverse-engineer the algorithm each model has learned.**\n\n### The two models\n\n| | Model 1 (easier) | Model 2 (harder) |\n|---|---|---|\n| **Numbers** | 0\u20139 | 0\u201399 |\n| **Tokenization** | One token per number | Two digit tokens per number (e.g. 42 \u2192 `4`, `2`) |\n| **Layers** | 1 | 2 |\n| **Heads** | 4 | 4 |\n| **`d_model`** | 64 | 64 |\n| **Parameters** | 18,944 | 35,712 |\n| **Vocab** | 14 tokens (0-9 + BOS, SEP, ANS, EOS) | 14 tokens (digits 0-9 + BOS, SEP, ANS, EOS) |\n\nBoth models are trained and evaluated on lists of length 5.\n\n### Architecture\n\nBoth models are **attention-only** causal transformers \u2014 no MLPs, no LayerNorm. The architecture is:\n\n```\ntoken_embedding + positional_embedding\n    \u2192 attention layer(s) with residual connections\n    \u2192 linear unembed \u2192 logits\n```\n\nPositional embeddings are learned (`nn.Embedding`). Each attention layer has `n_heads` independent heads, each computing Q, K, V projections, with a shared W_O output projection. The model returns both logits and per-layer attention patterns.\n\n### What constitutes a good solution?\n\n- **Describe the mechanism** the model uses to find the max.\n- **Provide evidence** via attention pattern visualizations, ablation experiments, activation patching, direct logit attribution, or other relevant techniques.\n- **Present your solution clearly** by utilizing the markdown cells, and presenting clean figures.\n\nWe recommend starting with Model 1 \u2014 it's simple enough that you should be able to fully explain every weight matrix.",
-   "outputs": [],
-   "execution_count": null
+   "source": [
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/andyrdt/puzzles/blob/main/04_2026/demo_minimal.ipynb)\n",
+    "\n",
+    "# Monthly Algorithmic Challenge — April 2026: Max of List\n",
+    "\n",
+    "*Inspired by [Callum McDougall](https://www.callumcdougall.net/)'s [ARENA Monthly Algorithmic Challenges](https://arena-chapter1-transformer-interp.streamlit.app/Monthly_Algorithmic_Challenge).*\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "We've trained two small attention-only transformers to solve the same task: **given a list of 5 numbers, predict the maximum.**\n",
+    "\n",
+    "Both models achieve 100% test accuracy. Your challenge: **reverse-engineer the algorithm each model has learned.**\n",
+    "\n",
+    "### The two models\n",
+    "\n",
+    "| | Model 1 (easier) | Model 2 (harder) |\n",
+    "|---|---|---|\n",
+    "| **Numbers** | 0–9 | 0–99 |\n",
+    "| **Tokenization** | One token per number | Two digit tokens per number (e.g. 42 → `4`, `2`) |\n",
+    "| **Layers** | 1 | 2 |\n",
+    "| **Heads** | 4 | 4 |\n",
+    "| **`d_model`** | 64 | 64 |\n",
+    "| **Parameters** | 18,944 | 35,712 |\n",
+    "| **Vocab** | 14 tokens (0-9 + BOS, SEP, ANS, EOS) | 14 tokens (digits 0-9 + BOS, SEP, ANS, EOS) |\n",
+    "\n",
+    "Both models are trained and evaluated on lists of length 5.\n",
+    "\n",
+    "### Architecture\n",
+    "\n",
+    "Both models are **attention-only** causal transformers — no MLPs, no LayerNorm. The architecture is:\n",
+    "\n",
+    "```\n",
+    "token_embedding + positional_embedding\n",
+    "    → attention layer(s) with residual connections\n",
+    "    → linear unembed → logits\n",
+    "```\n",
+    "\n",
+    "Positional embeddings are learned (`nn.Embedding`). Each attention layer has `n_heads` independent heads, each computing Q, K, V projections, with a shared W_O output projection. The model returns both logits and per-layer attention patterns.\n",
+    "\n",
+    "### What constitutes a good solution?\n",
+    "\n",
+    "- **Describe the mechanism** the model uses to find the max.\n",
+    "- **Provide evidence** via attention pattern visualizations, ablation experiments, activation patching, direct logit attribution, or other relevant techniques.\n",
+    "- **Present your solution clearly** by utilizing the markdown cells, and presenting clean figures.\n",
+    "\n",
+    "We recommend starting with Model 1 — it's simple enough that you should be able to fully explain every weight matrix."
+   ]
   },
   {
    "cell_type": "markdown",
    "id": "7r1ybe7gomm",
    "metadata": {},
    "source": [
     "## Setup"
-   ],
-   "outputs": [],
-   "execution_count": null
+   ]
   },
   {
    "cell_type": "code",
@@ -70,9 +112,7 @@
     "## Helper functions\n",
     "\n",
     "Tokenization helpers and a utility to visualize attention patterns. You'll use these throughout."
-   ],
-   "outputs": [],
-   "execution_count": null
+   ]
   },
   {
    "cell_type": "code",
@@ -81,7 +121,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# \u2500\u2500 Model 1 vocab (numbers 0-9, each is its own token) \u2500\u2500\n",
+    "# ── Model 1 vocab (numbers 0-9, each is its own token) ──\n",
     "NUM_RANGE_1 = 10\n",
     "BOS_1, SEP_1, ANS_1, EOS_1 = 10, 11, 12, 13\n",
     "VOCAB_SIZE_1 = 14\n",
@@ -102,7 +142,7 @@
     "    return [TOKEN_NAMES_1.get(t, str(t)) for t in tokens]\n",
     "\n",
     "\n",
-    "# \u2500\u2500 Model 2 vocab (digits 0-9, two per number) \u2500\u2500\n",
+    "# ── Model 2 vocab (digits 0-9, two per number) ──\n",
     "BOS_2, SEP_2, ANS_2, EOS_2 = 10, 11, 12, 13\n",
     "VOCAB_SIZE_2 = 14\n",
     "TOKEN_NAMES_2 = {10: \"BOS\", 11: \"SEP\", 12: \"ANS\", 13: \"EOS\"}\n",
@@ -123,7 +163,7 @@
     "    return [TOKEN_NAMES_2.get(t, str(t)) for t in tokens]\n",
     "\n",
     "\n",
-    "# \u2500\u2500 Attention visualization \u2500\u2500\n",
+    "# ── Attention visualization ──\n",
     "def _format_attn_ax(ax, attn_matrix, token_labels, title):\n",
     "    \"\"\"Format a single attention heatmap axis.\"\"\"\n",
     "    ax.imshow(attn_matrix, cmap=\"Blues\", vmin=0, vmax=1)\n",
@@ -177,20 +217,18 @@
    "metadata": {},
    "source": [
     "---\n",
-    "# Model 1: Max of list (0\u20139), 1-layer attention-only\n",
+    "# Model 1: Max of list (0–9), 1-layer attention-only\n",
     "\n",
-    "**Task**: Given 5 numbers from 0\u20139, predict the maximum.\n",
+    "**Task**: Given 5 numbers from 0–9, predict the maximum.\n",
     "\n",
     "**Input format**: `[BOS] n1 [SEP] n2 [SEP] n3 [SEP] n4 [SEP] n5 [ANS]`\n",
     "\n",
     "**Output**: At the `[ANS]` position, the model should predict the max value. Then it should predict `[EOS]`.\n",
     "\n",
-    "**Example**: `[BOS] 3 [SEP] 7 [SEP] 2 [SEP] 5 [SEP] 1 [ANS]` \u2192 model predicts `7`\n",
+    "**Example**: `[BOS] 3 [SEP] 7 [SEP] 2 [SEP] 5 [SEP] 1 [ANS]` → model predicts `7`\n",
     "\n",
-    "This model has a single attention layer with 4 heads and no MLPs. The entire computation is: embed \u2192 one multi-head attention layer (with residual) \u2192 unembed."
-   ],
-   "outputs": [],
-   "execution_count": null
+    "This model has a single attention layer with 4 heads and no MLPs. The entire computation is: embed → one multi-head attention layer (with residual) → unembed."
+   ]
   },
   {
    "cell_type": "code",
@@ -218,9 +256,7 @@
     "### Verifying Model 1 works\n",
     "\n",
     "Let's run a few examples and check the model gets them right. We show the full output distribution at the ANS position."
-   ],
-   "outputs": [],
-   "execution_count": null
+   ]
   },
   {
    "cell_type": "code",
@@ -242,10 +278,7 @@
     "    tokens = tokenize_1(nums)\n",
     "    x = torch.tensor([tokens], device=device)\n",
     "\n",
-    "    # NOTE: nnsight requires accessing modules in forward execution order\n",
-    "    # (layers before unembed)\n",
-    "    with model_1.trace(x):\n",
-    "        logits = model_1.unembed.output.save()\n",
+    "    logits = model_1(x)[0]\n",
     "\n",
     "    # Softmax over number tokens at the ANS position (last token)\n",
     "    probs = torch.softmax(logits[0, -1, :NUM_RANGE_1], dim=-1).detach().cpu()\n",
@@ -275,9 +308,7 @@
     "### Example: attention patterns for Model 1\n",
     "\n",
     "Here's what the 4 attention heads look like on a single input. The ANS row (bottom) is where the model reads from the input to make its prediction."
-   ],
-   "outputs": [],
-   "execution_count": null
+   ]
   },
   {
    "cell_type": "code",
@@ -311,32 +342,28 @@
     "Your goal is to try and understand how the model works.\n",
     "\n",
     "Good luck!"
-   ],
-   "outputs": [],
-   "execution_count": null
+   ]
   },
   {
    "cell_type": "markdown",
    "id": "2v1h3znl2op",
    "metadata": {},
    "source": [
     "---\n",
-    "# Model 2: Max of list (0\u201399), 2-layer attention-only, digit tokenization\n",
+    "# Model 2: Max of list (0–99), 2-layer attention-only, digit tokenization\n",
     "\n",
-    "**Task**: Given 5 numbers from 0\u201399, predict the maximum.\n",
+    "**Task**: Given 5 numbers from 0–99, predict the maximum.\n",
     "\n",
-    "**Tokenization**: Each number is split into two digit tokens (tens, ones), always zero-padded. So `42` \u2192 tokens `4`, `2` and `7` \u2192 tokens `0`, `7`.\n",
+    "**Tokenization**: Each number is split into two digit tokens (tens, ones), always zero-padded. So `42` → tokens `4`, `2` and `7` → tokens `0`, `7`.\n",
     "\n",
     "**Input format**: `[BOS] d1t d1o [SEP] d2t d2o [SEP] ... d5t d5o [ANS]`\n",
     "\n",
     "**Output**: At `[ANS]`, model predicts the tens digit of the max. Then the ones digit. Then `[EOS]`.\n",
     "\n",
-    "**Example**: `[BOS] 4 2 [SEP] 1 7 [SEP] 8 5 [SEP] 0 3 [SEP] 6 1 [ANS]` \u2192 model predicts `8` then `5`\n",
+    "**Example**: `[BOS] 4 2 [SEP] 1 7 [SEP] 8 5 [SEP] 0 3 [SEP] 6 1 [ANS]` → model predicts `8` then `5`\n",
     "\n",
-    "This model has **2 attention layers** with 4 heads each. The interesting question is how the layers divide the work \u2014 a 1-layer model can learn the tens digit (100% accuracy) but plateaus at ~40% for the ones digit."
-   ],
-   "outputs": [],
-   "execution_count": null
+    "This model has **2 attention layers** with 4 heads each. The interesting question is how the layers divide the work — a 1-layer model can learn the tens digit (100% accuracy) but plateaus at ~40% for the ones digit."
+   ]
   },
   {
    "cell_type": "code",
@@ -364,9 +391,7 @@
     "### Verifying Model 2 works\n",
     "\n",
     "For Model 2, the output is two tokens: tens digit then ones digit. We feed the input up to `[ANS]`, get the tens prediction, then feed that back to get the ones prediction."
-   ],
-   "outputs": [],
-   "execution_count": null
+   ]
   },
   {
    "cell_type": "code",
@@ -387,21 +412,19 @@
     "    x = torch.tensor([tokens], device=device)\n",
     "\n",
     "    # Get tens digit prediction (at ANS position)\n",
-    "    with model_2.trace(x):\n",
-    "        logits_tens = model_2.unembed.output.save()\n",
+    "    logits_tens = model_2(x)[0]\n",
     "    pred_tens = logits_tens[0, -1, :10].argmax().item()\n",
     "\n",
     "    # Feed tens digit back, get ones digit prediction\n",
     "    tokens_ext = tokens + [pred_tens]\n",
     "    x_ext = torch.tensor([tokens_ext], device=device)\n",
-    "    with model_2.trace(x_ext):\n",
-    "        logits_ones = model_2.unembed.output.save()\n",
+    "    logits_ones = model_2(x)[0]\n",
     "    pred_ones = logits_ones[0, -1, :10].argmax().item()\n",
     "\n",
     "    pred_num = pred_tens * 10 + pred_ones\n",
     "    true_max = max(nums)\n",
     "    status = \"correct\" if pred_num == true_max else \"WRONG\"\n",
-    "    print(f\"  {nums} \u2192 predicted {pred_num:2d}, true max {true_max:2d}  [{status}]\")"
+    "    print(f\"  {nums} → predicted {pred_num:2d}, true max {true_max:2d}  [{status}]\")"
    ]
   },
   {
@@ -412,9 +435,7 @@
     "### Example: attention patterns for Model 2\n",
     "\n",
     "With 2 layers, we can see how the model builds up its computation across layers."
-   ],
-   "outputs": [],
-   "execution_count": null
+   ]
   },
   {
    "cell_type": "code",
@@ -427,6 +448,7 @@
     "tokens = tokenize_2(nums)\n",
     "x = torch.tensor([tokens], device=device)\n",
     "\n",
+    "# NOTE: nnsight requires accessing modules in forward execution order\n",
     "with model_2.trace(x):\n",
     "    l0_out = model_2.layers[0].output.save()\n",
     "    l1_out = model_2.layers[1].output.save()\n",
@@ -452,14 +474,12 @@
     "Your goal is to try and understand how the model works.\n",
     "\n",
     "Good luck!"
-   ],
-   "outputs": [],
-   "execution_count": null
+   ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": ".venv",
+   "display_name": "test",
    "language": "python",
    "name": "python3"
   },
@@ -473,9 +493,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.14"
+   "version": "3.12.12"
   }
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
+}