diff --git a/04_2026/demo_minimal.ipynb b/04_2026/demo_minimal.ipynb index 9c47f69..f588aae 100644 --- a/04_2026/demo_minimal.ipynb +++ b/04_2026/demo_minimal.ipynb @@ -4,9 +4,53 @@ "cell_type": "markdown", "id": "dt50gf9tfxg", "metadata": {}, - "source": "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/andyrdt/puzzles/blob/main/04_2026/demo_minimal.ipynb)\n\n# Monthly Algorithmic Challenge \u2014 April 2026: Max of List\n\n*Inspired by [Callum McDougall](https://www.callumcdougall.net/)'s [ARENA Monthly Algorithmic Challenges](https://arena-chapter1-transformer-interp.streamlit.app/Monthly_Algorithmic_Challenge).*\n\n## Overview\n\nWe've trained two small attention-only transformers to solve the same task: **given a list of 5 numbers, predict the maximum.**\n\nBoth models achieve 100% test accuracy. Your challenge: **reverse-engineer the algorithm each model has learned.**\n\n### The two models\n\n| | Model 1 (easier) | Model 2 (harder) |\n|---|---|---|\n| **Numbers** | 0\u20139 | 0\u201399 |\n| **Tokenization** | One token per number | Two digit tokens per number (e.g. 42 \u2192 `4`, `2`) |\n| **Layers** | 1 | 2 |\n| **Heads** | 4 | 4 |\n| **`d_model`** | 64 | 64 |\n| **Parameters** | 18,944 | 35,712 |\n| **Vocab** | 14 tokens (0-9 + BOS, SEP, ANS, EOS) | 14 tokens (digits 0-9 + BOS, SEP, ANS, EOS) |\n\nBoth models are trained and evaluated on lists of length 5.\n\n### Architecture\n\nBoth models are **attention-only** causal transformers \u2014 no MLPs, no LayerNorm. The architecture is:\n\n```\ntoken_embedding + positional_embedding\n \u2192 attention layer(s) with residual connections\n \u2192 linear unembed \u2192 logits\n```\n\nPositional embeddings are learned (`nn.Embedding`). Each attention layer has `n_heads` independent heads, each computing Q, K, V projections, with a shared W_O output projection. The model returns both logits and per-layer attention patterns.\n\n### What constitutes a good solution?\n\n- **Describe the mechanism** the model uses to find the max.\n- **Provide evidence** via attention pattern visualizations, ablation experiments, activation patching, direct logit attribution, or other relevant techniques.\n- **Present your solution clearly** by utilizing the markdown cells, and presenting clean figures.\n\nWe recommend starting with Model 1 \u2014 it's simple enough that you should be able to fully explain every weight matrix.", - "outputs": [], - "execution_count": null + "source": [ + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/andyrdt/puzzles/blob/main/04_2026/demo_minimal.ipynb)\n", + "\n", + "# Monthly Algorithmic Challenge — April 2026: Max of List\n", + "\n", + "*Inspired by [Callum McDougall](https://www.callumcdougall.net/)'s [ARENA Monthly Algorithmic Challenges](https://arena-chapter1-transformer-interp.streamlit.app/Monthly_Algorithmic_Challenge).*\n", + "\n", + "## Overview\n", + "\n", + "We've trained two small attention-only transformers to solve the same task: **given a list of 5 numbers, predict the maximum.**\n", + "\n", + "Both models achieve 100% test accuracy. Your challenge: **reverse-engineer the algorithm each model has learned.**\n", + "\n", + "### The two models\n", + "\n", + "| | Model 1 (easier) | Model 2 (harder) |\n", + "|---|---|---|\n", + "| **Numbers** | 0–9 | 0–99 |\n", + "| **Tokenization** | One token per number | Two digit tokens per number (e.g. 42 → `4`, `2`) |\n", + "| **Layers** | 1 | 2 |\n", + "| **Heads** | 4 | 4 |\n", + "| **`d_model`** | 64 | 64 |\n", + "| **Parameters** | 18,944 | 35,712 |\n", + "| **Vocab** | 14 tokens (0-9 + BOS, SEP, ANS, EOS) | 14 tokens (digits 0-9 + BOS, SEP, ANS, EOS) |\n", + "\n", + "Both models are trained and evaluated on lists of length 5.\n", + "\n", + "### Architecture\n", + "\n", + "Both models are **attention-only** causal transformers — no MLPs, no LayerNorm. The architecture is:\n", + "\n", + "```\n", + "token_embedding + positional_embedding\n", + " → attention layer(s) with residual connections\n", + " → linear unembed → logits\n", + "```\n", + "\n", + "Positional embeddings are learned (`nn.Embedding`). Each attention layer has `n_heads` independent heads, each computing Q, K, V projections, with a shared W_O output projection. The model returns both logits and per-layer attention patterns.\n", + "\n", + "### What constitutes a good solution?\n", + "\n", + "- **Describe the mechanism** the model uses to find the max.\n", + "- **Provide evidence** via attention pattern visualizations, ablation experiments, activation patching, direct logit attribution, or other relevant techniques.\n", + "- **Present your solution clearly** by utilizing the markdown cells, and presenting clean figures.\n", + "\n", + "We recommend starting with Model 1 — it's simple enough that you should be able to fully explain every weight matrix." + ] }, { "cell_type": "markdown", @@ -14,9 +58,7 @@ "metadata": {}, "source": [ "## Setup" - ], - "outputs": [], - "execution_count": null + ] }, { "cell_type": "code", @@ -70,9 +112,7 @@ "## Helper functions\n", "\n", "Tokenization helpers and a utility to visualize attention patterns. You'll use these throughout." - ], - "outputs": [], - "execution_count": null + ] }, { "cell_type": "code", @@ -81,7 +121,7 @@ "metadata": {}, "outputs": [], "source": [ - "# \u2500\u2500 Model 1 vocab (numbers 0-9, each is its own token) \u2500\u2500\n", + "# ── Model 1 vocab (numbers 0-9, each is its own token) ──\n", "NUM_RANGE_1 = 10\n", "BOS_1, SEP_1, ANS_1, EOS_1 = 10, 11, 12, 13\n", "VOCAB_SIZE_1 = 14\n", @@ -102,7 +142,7 @@ " return [TOKEN_NAMES_1.get(t, str(t)) for t in tokens]\n", "\n", "\n", - "# \u2500\u2500 Model 2 vocab (digits 0-9, two per number) \u2500\u2500\n", + "# ── Model 2 vocab (digits 0-9, two per number) ──\n", "BOS_2, SEP_2, ANS_2, EOS_2 = 10, 11, 12, 13\n", "VOCAB_SIZE_2 = 14\n", "TOKEN_NAMES_2 = {10: \"BOS\", 11: \"SEP\", 12: \"ANS\", 13: \"EOS\"}\n", @@ -123,7 +163,7 @@ " return [TOKEN_NAMES_2.get(t, str(t)) for t in tokens]\n", "\n", "\n", - "# \u2500\u2500 Attention visualization \u2500\u2500\n", + "# ── Attention visualization ──\n", "def _format_attn_ax(ax, attn_matrix, token_labels, title):\n", " \"\"\"Format a single attention heatmap axis.\"\"\"\n", " ax.imshow(attn_matrix, cmap=\"Blues\", vmin=0, vmax=1)\n", @@ -177,20 +217,18 @@ "metadata": {}, "source": [ "---\n", - "# Model 1: Max of list (0\u20139), 1-layer attention-only\n", + "# Model 1: Max of list (0–9), 1-layer attention-only\n", "\n", - "**Task**: Given 5 numbers from 0\u20139, predict the maximum.\n", + "**Task**: Given 5 numbers from 0–9, predict the maximum.\n", "\n", "**Input format**: `[BOS] n1 [SEP] n2 [SEP] n3 [SEP] n4 [SEP] n5 [ANS]`\n", "\n", "**Output**: At the `[ANS]` position, the model should predict the max value. Then it should predict `[EOS]`.\n", "\n", - "**Example**: `[BOS] 3 [SEP] 7 [SEP] 2 [SEP] 5 [SEP] 1 [ANS]` \u2192 model predicts `7`\n", + "**Example**: `[BOS] 3 [SEP] 7 [SEP] 2 [SEP] 5 [SEP] 1 [ANS]` → model predicts `7`\n", "\n", - "This model has a single attention layer with 4 heads and no MLPs. The entire computation is: embed \u2192 one multi-head attention layer (with residual) \u2192 unembed." - ], - "outputs": [], - "execution_count": null + "This model has a single attention layer with 4 heads and no MLPs. The entire computation is: embed → one multi-head attention layer (with residual) → unembed." + ] }, { "cell_type": "code", @@ -218,9 +256,7 @@ "### Verifying Model 1 works\n", "\n", "Let's run a few examples and check the model gets them right. We show the full output distribution at the ANS position." - ], - "outputs": [], - "execution_count": null + ] }, { "cell_type": "code", @@ -242,10 +278,7 @@ " tokens = tokenize_1(nums)\n", " x = torch.tensor([tokens], device=device)\n", "\n", - " # NOTE: nnsight requires accessing modules in forward execution order\n", - " # (layers before unembed)\n", - " with model_1.trace(x):\n", - " logits = model_1.unembed.output.save()\n", + " logits = model_1(x)[0]\n", "\n", " # Softmax over number tokens at the ANS position (last token)\n", " probs = torch.softmax(logits[0, -1, :NUM_RANGE_1], dim=-1).detach().cpu()\n", @@ -275,9 +308,7 @@ "### Example: attention patterns for Model 1\n", "\n", "Here's what the 4 attention heads look like on a single input. The ANS row (bottom) is where the model reads from the input to make its prediction." - ], - "outputs": [], - "execution_count": null + ] }, { "cell_type": "code", @@ -311,9 +342,7 @@ "Your goal is to try and understand how the model works.\n", "\n", "Good luck!" - ], - "outputs": [], - "execution_count": null + ] }, { "cell_type": "markdown", @@ -321,22 +350,20 @@ "metadata": {}, "source": [ "---\n", - "# Model 2: Max of list (0\u201399), 2-layer attention-only, digit tokenization\n", + "# Model 2: Max of list (0–99), 2-layer attention-only, digit tokenization\n", "\n", - "**Task**: Given 5 numbers from 0\u201399, predict the maximum.\n", + "**Task**: Given 5 numbers from 0–99, predict the maximum.\n", "\n", - "**Tokenization**: Each number is split into two digit tokens (tens, ones), always zero-padded. So `42` \u2192 tokens `4`, `2` and `7` \u2192 tokens `0`, `7`.\n", + "**Tokenization**: Each number is split into two digit tokens (tens, ones), always zero-padded. So `42` → tokens `4`, `2` and `7` → tokens `0`, `7`.\n", "\n", "**Input format**: `[BOS] d1t d1o [SEP] d2t d2o [SEP] ... d5t d5o [ANS]`\n", "\n", "**Output**: At `[ANS]`, model predicts the tens digit of the max. Then the ones digit. Then `[EOS]`.\n", "\n", - "**Example**: `[BOS] 4 2 [SEP] 1 7 [SEP] 8 5 [SEP] 0 3 [SEP] 6 1 [ANS]` \u2192 model predicts `8` then `5`\n", + "**Example**: `[BOS] 4 2 [SEP] 1 7 [SEP] 8 5 [SEP] 0 3 [SEP] 6 1 [ANS]` → model predicts `8` then `5`\n", "\n", - "This model has **2 attention layers** with 4 heads each. The interesting question is how the layers divide the work \u2014 a 1-layer model can learn the tens digit (100% accuracy) but plateaus at ~40% for the ones digit." - ], - "outputs": [], - "execution_count": null + "This model has **2 attention layers** with 4 heads each. The interesting question is how the layers divide the work — a 1-layer model can learn the tens digit (100% accuracy) but plateaus at ~40% for the ones digit." + ] }, { "cell_type": "code", @@ -364,9 +391,7 @@ "### Verifying Model 2 works\n", "\n", "For Model 2, the output is two tokens: tens digit then ones digit. We feed the input up to `[ANS]`, get the tens prediction, then feed that back to get the ones prediction." - ], - "outputs": [], - "execution_count": null + ] }, { "cell_type": "code", @@ -387,21 +412,19 @@ " x = torch.tensor([tokens], device=device)\n", "\n", " # Get tens digit prediction (at ANS position)\n", - " with model_2.trace(x):\n", - " logits_tens = model_2.unembed.output.save()\n", + " logits_tens = model_2(x)[0]\n", " pred_tens = logits_tens[0, -1, :10].argmax().item()\n", "\n", " # Feed tens digit back, get ones digit prediction\n", " tokens_ext = tokens + [pred_tens]\n", " x_ext = torch.tensor([tokens_ext], device=device)\n", - " with model_2.trace(x_ext):\n", - " logits_ones = model_2.unembed.output.save()\n", + " logits_ones = model_2(x)[0]\n", " pred_ones = logits_ones[0, -1, :10].argmax().item()\n", "\n", " pred_num = pred_tens * 10 + pred_ones\n", " true_max = max(nums)\n", " status = \"correct\" if pred_num == true_max else \"WRONG\"\n", - " print(f\" {nums} \u2192 predicted {pred_num:2d}, true max {true_max:2d} [{status}]\")" + " print(f\" {nums} → predicted {pred_num:2d}, true max {true_max:2d} [{status}]\")" ] }, { @@ -412,9 +435,7 @@ "### Example: attention patterns for Model 2\n", "\n", "With 2 layers, we can see how the model builds up its computation across layers." - ], - "outputs": [], - "execution_count": null + ] }, { "cell_type": "code", @@ -427,6 +448,7 @@ "tokens = tokenize_2(nums)\n", "x = torch.tensor([tokens], device=device)\n", "\n", + "# NOTE: nnsight requires accessing modules in forward execution order\n", "with model_2.trace(x):\n", " l0_out = model_2.layers[0].output.save()\n", " l1_out = model_2.layers[1].output.save()\n", @@ -452,14 +474,12 @@ "Your goal is to try and understand how the model works.\n", "\n", "Good luck!" - ], - "outputs": [], - "execution_count": null + ] } ], "metadata": { "kernelspec": { - "display_name": ".venv", + "display_name": "test", "language": "python", "name": "python3" }, @@ -473,9 +493,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.14" + "version": "3.12.12" } }, "nbformat": 4, "nbformat_minor": 5 -} \ No newline at end of file +}