This repository contains code for the paper "Fine-Grained Analysis of Shared Syntactic Mechanisms in Language Models".
The dataset used for the analysis is in kumoryo9/shared-mech
Setup the enviornment with uv (uv sync etc.).
To log the results with wandb, setup enviornment variables.
export WANDB_API_KEY=<your_api_key>
- Activation patching on the residual stream, attention output, and MLP output in filler-gap dependencies. Change categories to
controlfor the control pattern andnpifor NPI licensing. For NPI licensing, add--src_base_pattern label2as well.
uv run python main.py\
--patch_type activation\
--model_name pythia\
--num_param 1b\
--num_steps 143000\
--log_dir /path/to/dir/for/logging\
--layer_idx 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15\
--categories filler_gap\
--proj_method vanilla\
--interv_component resid attn mlp\
--do_leave_one_out \
--batch_size 10\
- Activation patching on the attention heads at the last token for filler-gap dependencies. Analysis on other categories can be done in a similar manner to the above.
uv run python main.py\
--patch_type activation\
--model_name pythia\
--num_param 1b\
--num_steps 143000\
--log_dir /path/to/dir/for/logging\
--layer_idx 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15\
--categories filler_gap\
--proj_method vanilla\
--interv_component attn-head\
--do_leave_one_out \
--batch_size 10\
--only_last_token
- DAS on the residual stream, attention output, and MLP output in filler-gap dependencies. Analysis on other categories or attention heads can be done in a similar manner to the above.
uv run python main.py\
--patch_type activation\
--model_name pythia\
--num_param 1b\
--num_steps 143000\
--log_dir /path/to/dir/for/logging\
--layer_idx 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15\
--categories filler_gap\
--proj_method das\
--label_methods None label-ood\
--das_lr 5e-3\
--das_steps 100\
--interv_component resid attn mlp\
--do_leave_one_out \
--batch_size 10\
- Steering attention heads and evaluate the performance on BLiMP.
uv run python blimp.py\
--model_name pythia\
--num_param 1b\
--num_steps 143000\
--log_dir /path/to/dir/for/logging\
--steer_heads 7.5 7.6 9.2\
--steer_strength 0.8 1.0 1.2 1.5\
--categories all \
--batch_size 10\
- Steering attention heads and evaluate the performance on SyntaxGym. Data can be downloaded from here. You may have to modify the
predictions/formulaattribute injsonto parse correctly.
uv run python syntaxgym.py\
--model_name pythia\
--num_param 1b\
--num_steps 143000\
--log_dir /path/to/dir/for/logging\
--steer_heads 7.5 7.6 9.2\
--steer_strength 0.8 1.0 1.2 1.5\
--categories all \
--batch_size 10\
- Steering attention heads and evaluate the performance on HANS. Data can be downloaded from here.
uv run python nli.py\
--steer_strength 1.0\
--train_path /path/to/heuristics_train_set.txt \
--test_path /path/to/heuristics_test_set.txt
Generate data for patterns pattern_A and pattern_B in data/patterns/.
cd data
uv run python builder.py --patterns pattern_A pattern_B
We utilized code in CausalGym and Data Generation.