Cross-patching diagnostic showing instruction-tuned late effects depend on upstream state.
reproducibility large-language-models instruction-tuning mechanistic-interpretability activation-patching llm-interpretability model-diffing cross-patching
-
Updated
May 8, 2026 - Python