Exp 1+2: per-token gradient structure probes (compression + correct-vs-incorrect direction) by RiddleHe · Pull Request #6 · RiddleHe/nanochat

RiddleHe · 2026-04-23T23:43:40Z

Summary

Two structural gradient-probe experiments on top of the 500-step RL checkpoint
(follow-up to the 80/20 paper, 2506.01939). Both build on the same artifact: a
dump of per-token ∇log π_t vectors on one target weight row, for the base
Qwen2.5-1.5B-Instruct and the post-RL checkpoint.

Exp 1 — how many independent directions do per-bucket gradients span,
and do buckets specialize or share directions? Answered with per-bucket SVD
(participation ratio, rank95) and cross-bucket top-1 singular-vector cosine.
Exp 2 — on top-20%-entropy tokens, are correct and incorrect rollouts
pushing ∇log π in the same direction (would self-cancel under GRPO's ±1
advantage) or unrelated directions? Answered with per-prompt
cos(mean_grad_correct, mean_grad_incorrect) and a within-group half-split
control as noise floor.

Headline findings on Qwen2.5-1.5B:

Exp 1: per-bucket participation ratio drops ~2.5× after RL (100→40);
bin-4 (top-20% entropy) becomes ~orthogonal to all other buckets' top-1
directions (|cos| ≤ 0.07 vs 0.7–0.9 at base).
Exp 2: mean cos(correct, incorrect) = −0.001 across 35 prompts, matching
the within-group control (−0.010). No detectable directional alignment or
anti-alignment — consistent with Stage 3 (top-20% mask DAPO ≈ full-gradient
baseline).

Files

All under nanorl/scripts/:

file	role
`probe_grad_vectors.py`	GPU: sample rollouts from a given model, bucket response tokens by that model's own entropy, pick N positions per bucket per rollout, and save `∇log π_t` on ONE target weight row as `grads.npy` (`float32`, shape `(N_positions, row_dim)`) plus `meta.jsonl` + `manifest.json`.
`merge_grad_vector_shards.py`	CPU: concatenate multiple shard directories; recompute entropy-quantile edges over the combined distribution so bin labels are globally consistent.
`analyze_exp1_compression.py`	CPU: per-bucket SVD → participation ratio, rank95, cumulative energy curves. Cross-bucket 5×5 cosine heatmap of top-1 right singular vectors. Writes the two Exp 1 analysis figures + `exp1_summary.json`.
`analyze_exp2_direction.py`	CPU: filter to bin-4 positions, group by `(prompt_idx, correct)`. For prompts with both correct and incorrect rollouts, compute `cos(mean_g_correct, mean_g_incorrect)`. Also within-group half-split control. Writes the Exp 2 histogram + `exp2_summary.json`.
`plot_exp12_intuitive.py`	CPU: more readable headline plots — PR/rank95 bar charts, 2-D PCA projections, bin-4 cumulative energy (base vs trained), Exp 2 histogram + per-prompt sorted bars.
`run_exp1_exp2.sh`	Bash launcher: runs 4 probe shards in parallel (2 per model), merges each model's shards, then runs the three analysis scripts end-to-end.

Usage

End-to-end on 4 GPUs (defaults assume GPUs 4–7 free):

cd nanochat
export PYTHONPATH=$PWD
bash nanorl/scripts/run_exp1_exp2.sh

For a single probe:

python nanorl/scripts/probe_grad_vectors.py \
    --model-path /path/to/checkpoint \
    --output-dir /tmp/probe_out \
    --num-prompts 64 --num-samples 8 \
    --layer 14 --param-suffix mlp.down_proj.weight --row-idx 0

Caveats

Paths in run_exp1_exp2.sh and the OUT/BASE_D/TRAINED_D constants at
the top of the analysis scripts are hardcoded to /hdd/mh3897/cc/nanochat/...
for the machine the experiments were run on. They need editing for other
setups — left as-is to match the exact paths where the artifacts on disk
still live.
The probe targets one row of one weight matrix
(model.layers.14.mlp.down_proj.weight row 0, dim = 8960). Results should
replicate on other rows / layers; no evidence is offered here that they do.
Uses HuggingFace generate (not vLLM) for sampling, so not fast, but
tractable (~22 min for 4 parallel shards at 64 prompts × 8 rollouts).

Test plan

bash nanorl/scripts/run_exp1_exp2.sh end-to-end completes on a machine
with 4 free GPUs and writes all figures under .nanochat/probe/exp12_figures/.
Sanity-check manifest.json in each output dir: num_positions,
num_correct_rollouts, and entropy_edges look reasonable.
Visual check on exp1_headline.png (bar heights) and exp2_headline.png
(histogram centered on 0).

🤖 Generated with Claude Code

…1, Exp 2) Motivated by Wang et al. 2506.01939 ("80/20 rule"). Goes beyond the paper's aggregate norm observation to ask two structural questions about per-token gradients on a target weight row: Exp 1. How many independent directions do gradients in each entropy- percentile bucket actually span, and do different buckets share or specialize those directions? Exp 2. At top-20%-entropy tokens, do correct and incorrect rollouts push gradients in the same direction (+1 cos, would self-cancel under GRPO's ±1 advantage) or unrelated directions (~0 cos, no self-cancellation)? Six files: probe_grad_vectors.py collect per-token d(log pi_t) vectors on one target row; saves grads.npy + meta.jsonl + manifest.json merge_grad_vector_shards.py concatenate multi-GPU shards; rebucket entropy globally over the combined distribution analyze_exp1_compression.py per-bucket SVD -> participation ratio, rank95, cumulative energy; cross-bucket top-1 singular-vector cosine heatmap analyze_exp2_direction.py at bin-4 tokens, per-prompt cos(mean_grad_correct, mean_grad_incorrect); within-group half-split control plot_exp12_intuitive.py headline plots (PR bars, rank95 bars, 2-D PCA, bin-4 cumulative energy, cosine histogram + per-prompt sorted bars) run_exp1_exp2.sh launcher: 4 probe shards in parallel (2 per model) -> merge -> analyses -> plots Scripts use hardcoded machine-specific paths under /hdd/mh3897/cc/nanochat. These are top-of-file constants and need editing for other setups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exp 1+2: per-token gradient structure probes (compression + correct-vs-incorrect direction)#6

Exp 1+2: per-token gradient structure probes (compression + correct-vs-incorrect direction)#6
RiddleHe wants to merge 1 commit into
masterfrom
exp1-exp2-gradient-analysis

RiddleHe commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RiddleHe commented Apr 23, 2026

Summary

Files

Usage

Caveats

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants