Exp 1+2: per-token gradient structure probes (compression + correct-vs-incorrect direction)#6
Open
RiddleHe wants to merge 1 commit into
Open
Exp 1+2: per-token gradient structure probes (compression + correct-vs-incorrect direction)#6RiddleHe wants to merge 1 commit into
RiddleHe wants to merge 1 commit into
Conversation
…1, Exp 2)
Motivated by Wang et al. 2506.01939 ("80/20 rule"). Goes beyond the paper's
aggregate norm observation to ask two structural questions about per-token
gradients on a target weight row:
Exp 1. How many independent directions do gradients in each entropy-
percentile bucket actually span, and do different buckets share or
specialize those directions?
Exp 2. At top-20%-entropy tokens, do correct and incorrect rollouts push
gradients in the same direction (+1 cos, would self-cancel under
GRPO's ±1 advantage) or unrelated directions (~0 cos, no
self-cancellation)?
Six files:
probe_grad_vectors.py collect per-token d(log pi_t) vectors on one
target row; saves grads.npy + meta.jsonl
+ manifest.json
merge_grad_vector_shards.py concatenate multi-GPU shards; rebucket
entropy globally over the combined
distribution
analyze_exp1_compression.py per-bucket SVD -> participation ratio,
rank95, cumulative energy; cross-bucket
top-1 singular-vector cosine heatmap
analyze_exp2_direction.py at bin-4 tokens, per-prompt
cos(mean_grad_correct, mean_grad_incorrect);
within-group half-split control
plot_exp12_intuitive.py headline plots (PR bars, rank95 bars,
2-D PCA, bin-4 cumulative energy, cosine
histogram + per-prompt sorted bars)
run_exp1_exp2.sh launcher: 4 probe shards in parallel
(2 per model) -> merge -> analyses -> plots
Scripts use hardcoded machine-specific paths under /hdd/mh3897/cc/nanochat.
These are top-of-file constants and need editing for other setups.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two structural gradient-probe experiments on top of the 500-step RL checkpoint
(follow-up to the 80/20 paper, 2506.01939). Both build on the same artifact: a
dump of per-token
∇log π_tvectors on one target weight row, for the baseQwen2.5-1.5B-Instruct and the post-RL checkpoint.
and do buckets specialize or share directions? Answered with per-bucket SVD
(participation ratio, rank95) and cross-bucket top-1 singular-vector cosine.
pushing
∇log πin the same direction (would self-cancel under GRPO's ±1advantage) or unrelated directions? Answered with per-prompt
cos(mean_grad_correct, mean_grad_incorrect)and a within-group half-splitcontrol as noise floor.
Headline findings on Qwen2.5-1.5B:
bin-4 (top-20% entropy) becomes ~orthogonal to all other buckets' top-1
directions (|cos| ≤ 0.07 vs 0.7–0.9 at base).
the within-group control (−0.010). No detectable directional alignment or
anti-alignment — consistent with Stage 3 (top-20% mask DAPO ≈ full-gradient
baseline).
Files
All under
nanorl/scripts/:probe_grad_vectors.py∇log π_ton ONE target weight row asgrads.npy(float32, shape(N_positions, row_dim)) plusmeta.jsonl+manifest.json.merge_grad_vector_shards.pyanalyze_exp1_compression.pyexp1_summary.json.analyze_exp2_direction.py(prompt_idx, correct). For prompts with both correct and incorrect rollouts, computecos(mean_g_correct, mean_g_incorrect). Also within-group half-split control. Writes the Exp 2 histogram +exp2_summary.json.plot_exp12_intuitive.pyrun_exp1_exp2.shUsage
End-to-end on 4 GPUs (defaults assume GPUs 4–7 free):
For a single probe:
python nanorl/scripts/probe_grad_vectors.py \ --model-path /path/to/checkpoint \ --output-dir /tmp/probe_out \ --num-prompts 64 --num-samples 8 \ --layer 14 --param-suffix mlp.down_proj.weight --row-idx 0Caveats
run_exp1_exp2.shand theOUT/BASE_D/TRAINED_Dconstants atthe top of the analysis scripts are hardcoded to
/hdd/mh3897/cc/nanochat/...for the machine the experiments were run on. They need editing for other
setups — left as-is to match the exact paths where the artifacts on disk
still live.
(
model.layers.14.mlp.down_proj.weightrow 0, dim = 8960). Results shouldreplicate on other rows / layers; no evidence is offered here that they do.
generate(not vLLM) for sampling, so not fast, buttractable (~22 min for 4 parallel shards at 64 prompts × 8 rollouts).
Test plan
bash nanorl/scripts/run_exp1_exp2.shend-to-end completes on a machinewith 4 free GPUs and writes all figures under
.nanochat/probe/exp12_figures/.manifest.jsonin each output dir:num_positions,num_correct_rollouts, andentropy_edgeslook reasonable.exp1_headline.png(bar heights) andexp2_headline.png(histogram centered on 0).
🤖 Generated with Claude Code