Add geo-experiment design workflow and sensitivity check plotting for Synthetic Control#819
Add geo-experiment design workflow and sensitivity check plotting for Synthetic Control#819drbenvincent wants to merge 17 commits into
Conversation
Add prospective design capabilities so practitioners can assess whether a geo-experiment will work before committing budget: - `SyntheticControl.from_pre_period()`: classmethod that fits SC on pre-period data only, enabling prospective design assessment without requiring post-period observations - `validate_design()`: dress rehearsal that injects a known effect and checks if the model recovers it - `power_analysis()`: simulation-based Bayesian power curve across candidate effect sizes - `donor_pool_quality()`: composite quality score aggregating donor correlations, convex hull coverage, and weight concentration - `DressRehearsalCheck`: pipeline-compatible Check wrapper for sensitivity analysis integration - Result classes with `plot()` and `summary()` methods - 27 integration tests covering both prospective and retrospective workflows - Demo sections in sc_pymc.ipynb showing the real workflow: design assessment before analysis Made-with: Cursor
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #819 +/- ##
==========================================
- Coverage 93.77% 93.73% -0.05%
==========================================
Files 77 80 +3
Lines 11881 12333 +452
Branches 696 732 +36
==========================================
+ Hits 11142 11560 +418
- Misses 546 566 +20
- Partials 193 207 +14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Reorder cells and rewrite headings so the notebook mirrors a practitioner's actual workflow: design assessment before the experiment, causal analysis after. Key changes: - Move df.head() to the data-loading section - Move convex hull explanation before the design section - Rename headings to question-driven titles (educational-narrative) - Add clear "Before / After the experiment" phase headings - Add transition prose between design and analysis phases - Add power curve interpretation cell with go/no-go guidance - Link donor pool selection forward to donor_pool_quality() - Demote Effect Summary to subsection of analysis phase Made-with: Cursor
Donor pool selection and convex hull condition are pre-experiment checks — they now sit as subsections of "Before the experiment" rather than floating between Load data and the design section. Also adds a reminder in the "After" section that the convex hull check runs automatically when constructing the full SyntheticControl. Fixes missing nbformat properties across all output cells. Made-with: Cursor
Summarise the full before/after workflow under the title so readers can see the notebook's scope at a glance. Each step gets 2-3 sentences explaining what it does and why it matters. Made-with: Cursor
…pymc notebook Expand the Synthetic Control notebook with academic references (Abadie 2010/2015/2021, Athey & Imbens 2017, etc.) and add post-estimation robustness sections: placebo-in-space, placebo-in-time, leave-one-out, and prior sensitivity — each with result visualisations and interpretation guidance. Add 13 new BibTeX entries to references.bib. Made-with: Cursor
Replace the synthetic toy dataset with the canonical Abadie, Diamond & Hainmueller (2010) Proposition 99 dataset — per-capita cigarette sales across 39 US states, 1970-2000. This grounds the notebook in real data from the SC literature, improves connections to cited references, and gives robustness checks a realistic "good case" to demonstrate. - Add california_prop99.csv (wide format, 7 KB) and register as "prop99" - Update all narrative to California/tobacco policy context - Update all code cells: control_units, treated_unit, treatment_time - Adjust holdout_periods for the 19-year pre-period Made-with: Cursor
Enlarge the correlation heatmap for readability with 39 states, add an explicit donor pool selection step that removes states with negative pre-treatment correlation (threshold=0.0), and explain the threshold choice. Excludes Alabama, Arkansas, Georgia, Tennessee — leaving 34 well-correlated donors. Made-with: Cursor
Notebook fully executed with California Proposition 99 data: correlation heatmap, donor pool curation, design assessment, model fit, effect summaries, and all four robustness checks (placebo-in-space, placebo-in-time, leave-one-out, prior sensitivity) with visualisations. Made-with: Cursor
Extract shared plotting helpers (_plot_helpers.py) and add plot() staticmethods to PlaceboInSpace, PlaceboInTime, LeaveOneOut, and PriorSensitivity. Each check now auto-populates CheckResult.figures in run(). GenerateReport renders check figures in the HTML report. Replace ~80 lines of custom matplotlib in sc_pymc.ipynb with single-line library calls. Made-with: Cursor
- Add raw data time-series visualization after data loading - Add circle tile map showing per-state correlation with California - Add interpretation text after dress rehearsal plot - Document power curve Type I error issue as TODO; remove effect_size=0 - Reduce forest plot per-row height (0.45 -> 0.3) in _plot_helpers.py - Fix pre-existing nbformat validation issues in cell outputs Made-with: Cursor
Made-with: Cursor
Agents cannot detect unsaved IDE state, so prompt the user to confirm all files (especially notebooks with expensive outputs) are saved before staging and committing. Made-with: Cursor
…kflow Made-with: Cursor # Conflicts: # causalpy/data/datasets.py # causalpy/experiments/__init__.py # causalpy/experiments/synthetic_control.py # docs/source/notebooks/sc_pymc.ipynb # docs/source/references.bib
PR #834 consolidated all per-document `:::{bibliography}` blocks into the global `docs/source/references.rst` page to eliminate `bibtex.duplicate_citation` warnings. The newly-rewritten sc_pymc.ipynb still carried a local bibliography cell at the end; remove it so the notebook conforms to the new convention. Inline `{cite:p}` / `{cite:t}` references continue to resolve via the global bibliography. Made-with: Cursor
- Switch power_analysis criterion from default `hdi_excludes_zero` to `prob_gt_zero`. The HDI-based criterion is sign-blind and was flagging wrong-sign mis-fit artefacts as detections at small positive injected effects, producing a non-monotonic V-shape in the power curve. - Bump n_simulations 10 -> 25 and extend effect_sizes to np.linspace(0, 0.25, 6) so the curve includes the null point and has tighter Monte Carlo precision. - Update the surrounding markdown to describe the new criterion and reword the existing TODO admonition into a Caveat that records remaining sources of pseudo-post mis-fit bias and lists planned follow-ups (null-distribution calibration, longer holdout window, sign-aware HDI variant). Outputs intentionally unchanged here; the notebook will be re-run manually to refresh the power-curve figure and table. Made-with: Cursor
Reorder the design-phase sub-sections so the power curve comes first, then a streamlined `validate_design(injected_effect=0)` rehearsal explicitly framed as a placebo-in-time sanity check. Drop the `injected_effect=0.15` rehearsal. Add a closing "Putting it together" section that recaps each check, names the central tension between the clean power curve and the failed placebo-in-time, and separates magnitude estimation (likely biased) from existence-of-effect inference (rescued by placebo-in-space). Trim the power-curve caveat admonition to point forward to the new sub-section. Implements the "Right for the Wrong Reasons" narrative: simpler diagnostics look healthy, but the placebo-in-time check surfaces a structural identification problem with the donor pool. Made-with: Cursor
Question on the noise-injection step in
|
|
Hey, amazing work and dig here, I had this idea many times before and kinda like it because sound intuitive but, I guess this is re-implementing an existing capability already in CausalPy. Placebo in time was build with the goal of run a "power analysis" (bayesian assurance - power is more freq term). The approach solve and catch many of the issues or concerns you raise. Allowing you an output like this one: Check here ->
On the other hand, this power analysis give you information only about if the information holds. You don't need to re-run several times to get this, you can run the model over the most recent pre-period, get a posterior for different trajectories, estimate a CI, and then you can estimate properly what effect size would be greater than Z either in average or cumulative. You could simulate as many trajectories as you want and be creative here, but this only give you "power" based on given model uncertainty, and loop adds no information. Additionally, adding effect has the flaws you already detected, increasing complexity without solve previous point. You are right to mention about: design prior approaches. This is well documented, and it's the complement of the null model already coming from placebo estimation, you can say, "Based on prior knowledge, my expected cumulative effect (design prior) is N" then draw a full estimation based on it, and see where it lands, helping you to estimate the curve showed above. The "curve" comes out by construction — it's the joint integration of detection probability against the design prior, weighted by plausibility. Bayesian Assurance (O'Hagan, 2005) gives you the operating characteristics by integration — the "curve" emerges naturally without a brute-force simulation loop. CausalPy already implements this via PlaceboInTime's expected_effect_prior argument (#826 ). You can loop based on the different outcomes of this method to estimate best combination of donors or other characteristics. Read here and check the references! My take in short: It's a great job but reimplement's existing logic, after #826 I can make a PR with new plots only and a notebook to show this full pipeline and how this things are solved. Unless, I'm missing something here which is very probable. |

Summary
Adds prospective experiment-design capabilities and first-class sensitivity check plotting to
SyntheticControl.Design-phase methods — so practitioners can assess whether a geo-experiment will work before committing budget:
SyntheticControl.from_pre_period()creates a design-phase instance from pre-period data onlyvalidate_design()— dress rehearsal: injects a known effect and checks recoverypower_analysis()— simulation-based Bayesian power curvedonor_pool_quality()— composite quality score (correlation, convex hull, weight concentration)DressRehearsalCheckwraps dress rehearsal as aCheckfor pipeline integrationDressRehearsalResult,PowerCurveResult,DonorPoolQualityResult) withplot()andsummary()methodsSensitivity check plotting — previously, check visualizations lived as ~80 lines of custom matplotlib in the notebook. Now they are part of the library:
causalpy/checks/_plot_helpers.pywith sharedforest_plot()andnull_distribution_plot()helpersplot()staticmethods onPlaceboInSpace,PlaceboInTime,LeaveOneOut, andPriorSensitivityrun()auto-populatesCheckResult.figureswith matplotlib figuresGenerateReportnow renders check figures in the HTML report (base64-encoded PNGs)PlaceboInSpace.plot(result, baseline_stats=stats))Notebook overhaul (
sc_pymc.ipynb):Test plan
test_sc_design.py)test_check_plots.py)interrogatefailure is pre-existing (84% vs 85%)make html