This repository contains minimal reproduction scripts for the blog post:
In-context learning of representations can be explained by induction circuits Andy Arditi (Northeastern University)
The blog post responds to Park et al., 2025, who find that when LLMs process random walks on a graph in-context, their token representations come to mirror the graph's connectivity structure. We offer a simpler mechanistic explanation: the task can be solved by induction circuits, and the geometric structure of representations is a byproduct of previous-token mixing within those circuits.
Requirements: Python 3.10+ and uv. Scripts 01 and 02 require a CUDA-capable GPU with at least 48 GB of memory (for Llama-3.1-8B). 03_neighbor_mixing.py runs on CPU only and has no GPU requirement.
# 1. Create and activate a virtual environment
uv venv
source .venv/bin/activate
# 2. Install dependencies
uv pip install -r requirements.txt
# 3. Log in to Hugging Face (required for Llama-3.1-8B access)
huggingface-cli loginLlama-3.1-8B is a gated model — you'll need to have accepted the license on Hugging Face before running scripts 01 and 02.
Reproduces the core results from Park et al.: a language model performing in-context learning on a grid random walk task.
python 01_reproduce.pyOutputs (in results/reproduce/plots/):
| File | Description |
|---|---|
accuracy_curve.{pdf,png,html} |
Fig 2 left. Accuracy (probability on valid next tokens) as a function of context length. |
pca_class_means.{pdf,png,html} |
Fig 2 right. PCA of the 16 class-mean activations at layer 26 (last 200 positions). |
bigram_pca.{pdf,png,html} |
Fig 6. Individual activations projected onto the same PCA directions. Fill color = current token, border color = previous token. |
Tests the induction circuit hypothesis by ablating attention heads.
python 02_ablation.pyThis script first identifies induction heads and previous-token heads using repeated random token sequences (Appendix A), then ablates the top-k heads of each type and measures the effect on accuracy and representations.
Outputs (in results/ablation/plots/):
| File | Description |
|---|---|
ablation_induction.{pdf,png,html} |
Fig 3 left. Accuracy curves when ablating top-k induction heads (k = 1, 2, 4, 8, 16, 32). |
ablation_prev_token.{pdf,png,html} |
Fig 3 right. Accuracy curves when ablating top-k previous-token heads. |
pca_baseline.{pdf,png,html} |
Fig 4 left. Class-mean PCA with no ablation (baseline). |
pca_induction_ablated.{pdf,png,html} |
Fig 4 center. Class-mean PCA with top-32 induction heads ablated. |
pca_prev_token_ablated.{pdf,png,html} |
Fig 4 right. Class-mean PCA with top-32 previous-token heads ablated. |
Demonstrates that a single round of previous-token (neighbor) mixing can explain the emergent grid structure in representations. No model or GPU required.
python 03_neighbor_mixing.pyOutputs (in results/neighbor_mixing/plots/):
| File | Description |
|---|---|
before_mixing.{pdf,png,html} |
Fig 5 left. PCA of 16 random Gaussian vectors in R^4096. |
after_mixing.{pdf,png,html} |
Fig 5 right. PCA after applying one round of neighbor mixing: each embedding is updated by adding the mean of its grid neighbors' embeddings. |
The task uses a 4×4 grid of common English words:
apple bird car egg
house milk plane opera
box sand sun mango
rock math code phone
A random walk produces a sequence of words by moving to adjacent cells (up/down/left/right). The model is given this sequence and must predict valid next words at each position. A "correct" prediction is one that places probability on tokens corresponding to grid neighbors of the current word.
.
├── README.md
├── requirements.txt
├── utils.py # Shared utilities (Grid, model loading, PCA, plotting)
├── 01_reproduce.py # Fig 2 (accuracy + PCA) and Fig 6 (bigram PCA)
├── 02_ablation.py # Fig 3 (ablation accuracy) and Fig 4 (ablation PCA)
├── 03_neighbor_mixing.py # Fig 5 (toy model of neighbor mixing)
└── results/
├── reproduce/
│ ├── data/ # Cached activations, accuracies, sequences
│ └── plots/ # PDF, PNG, and interactive HTML figures
├── ablation/
│ ├── data/ # Cached head scores, ablation accuracies, PCA data
│ └── plots/
└── neighbor_mixing/
├── data/ # Cached mixing projections
└── plots/
All scripts use set_seed(42) for deterministic results. The 16 accuracy curves use uniform initialization (one sequence starting at each grid position) to ensure all positions are represented.
- Model:
meta-llama/Llama-3.1-8Bvia TransformerLens - Analysis layer: 26 (residual stream pre-attention)
- Sequence length: 1,400 tokens
- PCA lookback: last 200 positions
If you use this code, please cite:
TODOMIT