Skip to content

Exp 1+2: per-token gradient structure probes (compression + correct-vs-incorrect direction)#6

Open
RiddleHe wants to merge 1 commit into
masterfrom
exp1-exp2-gradient-analysis
Open

Exp 1+2: per-token gradient structure probes (compression + correct-vs-incorrect direction)#6
RiddleHe wants to merge 1 commit into
masterfrom
exp1-exp2-gradient-analysis

Conversation

@RiddleHe

Copy link
Copy Markdown
Owner

Summary

Two structural gradient-probe experiments on top of the 500-step RL checkpoint
(follow-up to the 80/20 paper, 2506.01939). Both build on the same artifact: a
dump of per-token ∇log π_t vectors on one target weight row, for the base
Qwen2.5-1.5B-Instruct and the post-RL checkpoint.

  • Exp 1 — how many independent directions do per-bucket gradients span,
    and do buckets specialize or share directions? Answered with per-bucket SVD
    (participation ratio, rank95) and cross-bucket top-1 singular-vector cosine.
  • Exp 2 — on top-20%-entropy tokens, are correct and incorrect rollouts
    pushing ∇log π in the same direction (would self-cancel under GRPO's ±1
    advantage) or unrelated directions? Answered with per-prompt
    cos(mean_grad_correct, mean_grad_incorrect) and a within-group half-split
    control as noise floor.

Headline findings on Qwen2.5-1.5B:

  • Exp 1: per-bucket participation ratio drops ~2.5× after RL (100→40);
    bin-4 (top-20% entropy) becomes ~orthogonal to all other buckets' top-1
    directions (|cos| ≤ 0.07 vs 0.7–0.9 at base).
  • Exp 2: mean cos(correct, incorrect) = −0.001 across 35 prompts, matching
    the within-group control (−0.010). No detectable directional alignment or
    anti-alignment — consistent with Stage 3 (top-20% mask DAPO ≈ full-gradient
    baseline).

Files

All under nanorl/scripts/:

file role
probe_grad_vectors.py GPU: sample rollouts from a given model, bucket response tokens by that model's own entropy, pick N positions per bucket per rollout, and save ∇log π_t on ONE target weight row as grads.npy (float32, shape (N_positions, row_dim)) plus meta.jsonl + manifest.json.
merge_grad_vector_shards.py CPU: concatenate multiple shard directories; recompute entropy-quantile edges over the combined distribution so bin labels are globally consistent.
analyze_exp1_compression.py CPU: per-bucket SVD → participation ratio, rank95, cumulative energy curves. Cross-bucket 5×5 cosine heatmap of top-1 right singular vectors. Writes the two Exp 1 analysis figures + exp1_summary.json.
analyze_exp2_direction.py CPU: filter to bin-4 positions, group by (prompt_idx, correct). For prompts with both correct and incorrect rollouts, compute cos(mean_g_correct, mean_g_incorrect). Also within-group half-split control. Writes the Exp 2 histogram + exp2_summary.json.
plot_exp12_intuitive.py CPU: more readable headline plots — PR/rank95 bar charts, 2-D PCA projections, bin-4 cumulative energy (base vs trained), Exp 2 histogram + per-prompt sorted bars.
run_exp1_exp2.sh Bash launcher: runs 4 probe shards in parallel (2 per model), merges each model's shards, then runs the three analysis scripts end-to-end.

Usage

End-to-end on 4 GPUs (defaults assume GPUs 4–7 free):

cd nanochat
export PYTHONPATH=$PWD
bash nanorl/scripts/run_exp1_exp2.sh

For a single probe:

python nanorl/scripts/probe_grad_vectors.py \
    --model-path /path/to/checkpoint \
    --output-dir /tmp/probe_out \
    --num-prompts 64 --num-samples 8 \
    --layer 14 --param-suffix mlp.down_proj.weight --row-idx 0

Caveats

  • Paths in run_exp1_exp2.sh and the OUT/BASE_D/TRAINED_D constants at
    the top of the analysis scripts are hardcoded to /hdd/mh3897/cc/nanochat/...
    for the machine the experiments were run on. They need editing for other
    setups — left as-is to match the exact paths where the artifacts on disk
    still live.
  • The probe targets one row of one weight matrix
    (model.layers.14.mlp.down_proj.weight row 0, dim = 8960). Results should
    replicate on other rows / layers; no evidence is offered here that they do.
  • Uses HuggingFace generate (not vLLM) for sampling, so not fast, but
    tractable (~22 min for 4 parallel shards at 64 prompts × 8 rollouts).

Test plan

  • bash nanorl/scripts/run_exp1_exp2.sh end-to-end completes on a machine
    with 4 free GPUs and writes all figures under .nanochat/probe/exp12_figures/.
  • Sanity-check manifest.json in each output dir: num_positions,
    num_correct_rollouts, and entropy_edges look reasonable.
  • Visual check on exp1_headline.png (bar heights) and exp2_headline.png
    (histogram centered on 0).

🤖 Generated with Claude Code

…1, Exp 2)

Motivated by Wang et al. 2506.01939 ("80/20 rule"). Goes beyond the paper's
aggregate norm observation to ask two structural questions about per-token
gradients on a target weight row:

  Exp 1. How many independent directions do gradients in each entropy-
         percentile bucket actually span, and do different buckets share or
         specialize those directions?

  Exp 2. At top-20%-entropy tokens, do correct and incorrect rollouts push
         gradients in the same direction (+1 cos, would self-cancel under
         GRPO's ±1 advantage) or unrelated directions (~0 cos, no
         self-cancellation)?

Six files:

  probe_grad_vectors.py           collect per-token d(log pi_t) vectors on one
                                  target row; saves grads.npy + meta.jsonl
                                  + manifest.json

  merge_grad_vector_shards.py     concatenate multi-GPU shards; rebucket
                                  entropy globally over the combined
                                  distribution

  analyze_exp1_compression.py     per-bucket SVD -> participation ratio,
                                  rank95, cumulative energy; cross-bucket
                                  top-1 singular-vector cosine heatmap

  analyze_exp2_direction.py       at bin-4 tokens, per-prompt
                                  cos(mean_grad_correct, mean_grad_incorrect);
                                  within-group half-split control

  plot_exp12_intuitive.py         headline plots (PR bars, rank95 bars,
                                  2-D PCA, bin-4 cumulative energy, cosine
                                  histogram + per-prompt sorted bars)

  run_exp1_exp2.sh                launcher: 4 probe shards in parallel
                                  (2 per model) -> merge -> analyses -> plots

Scripts use hardcoded machine-specific paths under /hdd/mh3897/cc/nanochat.
These are top-of-file constants and need editing for other setups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants