Skip to content

MaxwellNi/mechinterp

Repository files navigation

mechinterp Stage 1: OR Stimulus Design

This repository implements the Stage 1 stimulus-design module for an Opportunity Recognition mechanistic interpretability pipeline. It builds matched OR_PRESENT / OR_ABSENT vignette pairs, audits shortcut cues, runs shallow artifact diagnostics, filters high-bias pairs, creates held-out splits, and writes a dataset report.

The default provider is deterministic and offline. Optional real providers only read API keys from environment variables and are not used by smoke runs.

Quick Smoke Run

In this workspace, run Python through the insider micromamba environment:

micromamba run -n insider python scripts/generate.py --config configs/stage1_or.yaml --provider mock --n-pairs 50 --seed 20260512 --output data/stage1/raw/pairs.jsonl
micromamba run -n insider python scripts/audit.py --config configs/stage1_or.yaml --input data/stage1/raw/pairs.jsonl --output data/stage1/audited/audit_results.jsonl
micromamba run -n insider python scripts/diagnose.py --config configs/stage1_or.yaml --input data/stage1/raw/pairs.jsonl --audit data/stage1/audited/audit_results.jsonl --output-dir data/stage1/diagnostics
micromamba run -n insider python scripts/filter.py --config configs/stage1_or.yaml --input data/stage1/raw/pairs.jsonl --audit data/stage1/audited/audit_results.jsonl --bias data/stage1/diagnostics/pair_bias_scores.jsonl --method drop --output data/stage1/filtered/pairs_filtered.jsonl
micromamba run -n insider python scripts/split.py --config configs/stage1_or.yaml --input data/stage1/filtered/pairs_filtered.jsonl --output-dir data/stage1/splits
micromamba run -n insider python scripts/report.py --config configs/stage1_or.yaml
micromamba run -n insider pytest -q

Outputs

  • Raw pairs: data/stage1/raw/pairs.jsonl
  • Audit results: data/stage1/audited/audit_results.jsonl
  • Diagnostic summary and bias scores: data/stage1/diagnostics/
  • Filtered pairs and filter report: data/stage1/filtered/
  • Random and held-out splits: data/stage1/splits/
  • Final report and run log: data/stage1/reports/

Scope

This package implements Stage 1 only. It intentionally does not extract model activations, estimate MI directions, map SAE features, or steer a model. The Stage 1 outputs preserve metadata needed by those later stages.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages