Learning How Much to Trust Domain Priors for Causal Structure Discovery
NeurIPS 2026 submission (anonymous).
Existing causal discovery methods either ignore priors or impose them globally---but real priors have spatially varying reliability (physical laws give high-confidence edges, LLM suggestions are speculative). PRCD-MAP is the first framework with structure-aware trust calibration:
-
Reliable prior →
$+0.158$ AUROC over best baseline in low-data regimes -
Mediocre prior → graceful fallback to no-prior (
$\leq -0.038$ ) -
Fixed-trust alternatives → collapse (
$-0.156$ )
The core mechanism is structure-aware trust propagation (per-edge
PRCD-MAP/
├── src/ # Core model implementations
│ ├── model_linear.py # Linear SVAR + per-group τ (baseline)
│ ├── model_nam.py # Neural Additive Model variant
│ ├── trust_propagation.py # Structure-aware trust (GAT + Lite)
│ ├── model_linear_trust.py # Linear + trust propagation
│ ├── model_nam_trust.py # NAM + trust propagation
│ ├── utils.py # Data gen, baselines, metrics
│ └── utils_trust.py # Trust-propagation wrappers
│
├── experiments/ # 18 main experiments + 10 verification scripts
│ ├── exp1_synthetic_benchmark.py # Synthetic SVAR (Table 1)
│ ├── exp2_real_benchmarks_original.py # CausalTime + electricity
│ ├── exp3_ablation.py # Ablation (Table 4)
│ ├── exp4_scalability.py # Scalability
│ ├── exp5_cross_sectional.py # Cross-sectional SEM (App K)
│ ├── exp6_trust_validation.py # Trust vs per-group
│ ├── exp7_real_benchmarks_trust.py # Trust on real data (Table 2)
│ ├── exp8_scalability_trust.py # Trust scalability (App G)
│ ├── exp9_llm_prior_pipeline.py # LLM prior end-to-end (App B)
│ ├── exp10_community_mixing.py # Designed validation (Table 3/7)
│ ├── exp11_significance_test.py # 10-seed paired test (App L)
│ ├── exp12_theory_verification.py # Numerical theorem check
│ ├── exp13_table1_t50_10seeds.py # Table 1 row T=50, 10-seed CI tightening
│ ├── exp14_bayesdag_baseline.py # BayesDAG (Annadani et al., NeurIPS 2023) baseline
│ ├── exp15_table1_extended_seeds.py # Table 1 T={100,200,500} 10-seed extension
│ ├── exp16_lambda_sensitivity.py # λ1 warmup-factor sensitivity (W7)
│ ├── exp17_contemporaneous_dominant.py # Contemporaneous-dominant ablation (W8)
│ ├── exp18_llm_variance_decomp.py # LLM × prompt-style variance decomposition (Q10)
│ ├── _run_priormode.py # Wrapper: prior-mode override (systematic / adversarial)
│ ├── gen_correlation_priors.py # Statistical priors for CausalTime (no LLM domain knowledge)
│ ├── verify_*.py # 10 reviewer-response verification scripts (see below)
│ └── llm_prior_cache/ # Cached LLM-derived prior matrices (.npy + JSON manifests)
│
├── data_loaders/ # Data prep + baseline runners
│ ├── generate_llm_priors.py
│ ├── baseline_dycast.py
│ └── baseline_rhino.py
│
├── tools/
│ └── merge_priors.py # Merge multi-LLM prior caches into a single style index
│
├── scripts/run_all.sh # One-click reproduction
├── results/ # Pre-computed result CSVs (see results/README.md)
│ ├── causaltime_10seed/ # 10-seed CausalTime trust validation
│ ├── nonlinear/ # Nonlinear regime characterization
│ ├── scale/ # d=20/50/100 with baselines
│ ├── cross_sectional/ # NOTEARS/DAGMA comparison
│ ├── ablation/, community_mixing/, significance/
├── assets/ # Figures for README
├── data/ # Dataset directory (README.md inside)
├── requirements.txt
├── LICENSE
└── README.md
Python 3.10+ with PyTorch:
pip install -r requirements.txt
# Optional baselines:
pip install tigramite # PCMCI+
pip install lingam # VARLiNGAM
pip install anthropic # For exp9 LLM pipeline (live API calls; not required if using cached priors)import sys, numpy as np
sys.path.insert(0, "src")
from model_linear_trust import PRCD_MAP_Trust, train_prcd_trust_alm
from utils_trust import run_prcd_trust
# Your time series: (T, d) standardized; prior matrix P_prior in [0,1]^{d×d}
X = np.random.randn(500, 20)
P_prior = np.random.uniform(0, 1, (20, 20))
np.fill_diagonal(P_prior, 0.0)
W0, Wk, tau = run_prcd_trust(
X, P_prior, d=20, K=1,
lambda1=0.001, lambda2=0.01,
max_iter=35, inner_iter=400, lr=8e-3, seed=0)
# W0: (d, d) instantaneous graph; Wk: list of lag matrices; tau: mean learned trustcd experiments/
python exp1_synthetic_benchmark.py --sub sample_size --seeds 0 1 2 # Table 1
python exp7_real_benchmarks_trust.py --bench causaltime --seeds 0 1 2 # Table 2
python exp3_ablation.py --seeds 0 1 2 # Table 3python exp10_community_mixing.py --variant v1 --seeds 0 1 2 # BA d=20, main designed validation
python exp10_community_mixing.py --variant v2 --seeds 0 1 2 # BA d=30, scale
python exp10_community_mixing.py --variant v3 --seeds 0 1 2 # ER negative control
python exp10_community_mixing.py --variant v4 --seeds 0 1 2 # Extreme heterogeneitypython exp6_trust_validation.py --sub prior --seeds 0 1 2 # Table 8
python exp6_trust_validation.py --sub nonlinear --seeds 0 1 2 # Nonlinear validation
python exp8_scalability_trust.py --sub scale --seeds 0 1 2 # Scalability (App G)
python exp5_cross_sectional.py # Cross-sectional (App K)
python exp11_significance_test.py --seeds 0 1 2 3 4 5 6 7 8 9 # 10-seed paired test
python exp12_theory_verification.py # Numerical theorem check# 1. Generate cached priors (uses domain templates; no API key required)
python ../data_loaders/generate_llm_priors.py
# 2. Run end-to-end pipeline
python exp9_llm_prior_pipeline.py --dataset AQI --seeds 0 1 2python exp13_table1_t50_10seeds.py --seeds 0 1 2 3 4 5 6 7 8 9 # Table 1 T=50 with 10 seeds
python exp15_table1_extended_seeds.py # Table 1 T={100,200,500} with 10 seeds
python exp14_bayesdag_baseline.py --bench medical --seeds 10 # BayesDAG side-by-side
python exp16_lambda_sensitivity.py # λ1 warmup-factor sensitivity (W7)
python exp17_contemporaneous_dominant.py # Contemporaneous-dominant ablation (W8)
python exp18_llm_variance_decomp.py # LLM × prompt-style variance decomposition (Q10)Self-contained scripts that directly check claims flagged in review:
verify_realised_constants.py— measurec_min/c_max,λ_min(Σ̂), realisedC_1verify_bilevel_stabilization.py— active-set Hamming distance across ALM iterations (Asm 2)verify_cor4_proxy_grid.py—Δ_proxyrealised across the acc grid (Cor 4 / T-3)verify_d_sweep_full.py— full d-sweep with ALM early-termination disabled (W6)verify_w3_weak_data.py— EB behavior in T≪d regime (W3)verify_w4_lag_resolved.py— lag-resolved prior empirical check (W4)verify_w6a_d100.py— d=100 withtol=0(W6a)verify_e4_m2_causaltime.py— M2 ablation directly on CausalTime (E-4)verify_e5_noprior_canonical.py— unified "no prior" baseline across Tables 5/9/12/15 (E-5)verify_method3_threshold_labels.py— alternative threshold-based EB soft labels
- Synthetic data: Generated on the fly (ER/BA graphs, SVAR simulation).
- CausalTime: Download from the public CausalTime benchmark (MIT license); place at
data/causaltime/{AQI,Traffic,Medical}/. - Electricity: Sector-level monthly consumption data from a national electricity council's statistical yearbook; subject to data-sharing policy, available upon request for review purposes.
| Paper | Code | Pre-computed CSV |
|---|---|---|
| §3.2 Eq. (1)–(7) MAP objective | src/model_linear.py |
— |
| §3.2 Eq. (8) Trust propagation |
src/trust_propagation.py, src/model_linear_trust.py
|
— |
| §3.1 NAM extension (App F) |
src/model_nam.py, src/model_nam_trust.py
|
— |
| §3.3 Empirical Bayes |
train_prcd_alm (linear) / train_prcd_trust_alm (trust) |
— |
| §4.2 Asymmetric robustness (Table 1) | experiments/exp1_synthetic_benchmark.py |
— |
| §4.3 CausalTime (Table 2) | experiments/exp7_real_benchmarks_trust.py |
results/causaltime_10seed/ |
| §4.4 Ablation (Table 4) | experiments/exp3_ablation.py |
results/ablation/ |
| §4.4 Community Mixing (Table 3) | experiments/exp10_community_mixing.py |
results/community_mixing/ |
| Sec. 4 nonlinear PCMCI+ trade-off | experiments/exp1_synthetic_benchmark.py --sub nonlinear |
results/nonlinear/ |
| App "Main-text Scalability" ( |
experiments/exp1_synthetic_benchmark.py --sub scale |
results/scale/ |
| App "Cross-Sectional Structure Learning" | experiments/exp5_cross_sectional.py |
results/cross_sectional/ |
| App "10-seed Trust Validation" on CausalTime | experiments/exp7_real_benchmarks_trust.py |
results/causaltime_10seed/ |
| App G Scalability | experiments/exp8_scalability_trust.py |
— |
| App L Significance test | experiments/exp11_significance_test.py |
results/significance/ |
| Numerical theorem verification | experiments/exp12_theory_verification.py |
— |
Tested on NVIDIA RTX 2080 Ti (11 GB). PRCD-MAP with trust propagation completes
@inproceedings{anon2026prcd,
title={Learning How Much to Trust Domain Priors for Causal Structure Discovery},
author={Anonymous},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2026}
}MIT License. See LICENSE.
