Exploratory experiment class refactor, focussing on `InterruptedTimeSeries` by drbenvincent · Pull Request #524 · pymc-labs/CausalPy

drbenvincent · 2025-08-30T15:22:31Z

At the moment this is a bit of an experiment. I'm trying out a number of different ideas for refactoring of the experiment class. Just to test out the idea I'm focussing on the InterruptedTimeSeries class.

Main things I've done are:

Extract the core causal inference algorithm from __init__ to the algorithm method. This is not only more pythonic, but it also gives us a very nice and mostly readable method that captures the core logic of this quasi-experimental method.
Extract data preparation from __init__ to the _build_data method. Increases modularity, testability, and tidies things up.
The data related class attributes were previously scattered all over the place, but now these are neatly collected up into self.data which is an xarray.Dataset. This keeps things tidy but also aids discoverability of the information that people want.
The resulting __init__ is nice and minimal. We still automatically trigger the model fitting, by calling self.algorithm, but there is the potential to not do this if we want to enable a more traditional Bayesian workflow where we build a model and do prior/prior predictive checks before fitting the model. But I'm not doing that in this refactor because it's a major workflow/API change.
For this pre/post quasi experimental design I've also moved away from separating pre and post treatment periods. We just have self.impact for example which has an a period dimension. So if we want the post intervention impact, we can get that by result.impact.sel(period=="post"). Mostly this will be invisible to the user, but for those doing manual interrogation of results then there might be slight changes in the API to document in the notebooks. I'm not wedded to this, and we could always have temporary accessor properties to replicate previous behaviour, which we could then deprecate.
Refactored plotting.
a. I've separated computation/processing of results and the plotting. So we have get_plot_data_bayesian and get_plot_data_ols which both return data frames. Now the plot functions only ingest these data frames
b. We now just have one plot method, and this deals with bayesian vs ols models with conditional logic. The motivation for that was to avoid massive duplication because the plots for each were so similar.
c. What I have not yet done is to make the plot function only ingest the raw dataframe. At the moment it still gets a bunch of self attributes, but it would probably be better for the plot functions to just operate on data objects. I think the next step here would be to make this data an xarray.Dataset rather than a dataframe for greater flexibility (i.e. you can add meta data), but it also comes with some good save/load functionality from xarray. This plot refactoring is inspired by what seems to work quite well on some client projects.

📚 Documentation preview 📚: https://causalpy--524.org.readthedocs.build/en/524/

…thm method

codecov · 2025-08-30T15:24:56Z

Codecov Report

❌ Patch coverage is 99.00990% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 95.20%. Comparing base (23e0f3e) to head (1cf6e1d).
⚠️ Report is 351 commits behind head on main.

Files with missing lines	Patch %	Lines
causalpy/experiments/interrupted_time_series.py	99.00%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #524   +/-   ##
=======================================
  Coverage   95.19%   95.20%           
=======================================
  Files          28       28           
  Lines        2457     2462    +5     
=======================================
+ Hits         2339     2344    +5     
  Misses        118      118

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

review-notebook-app · 2025-09-15T10:00:52Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

drbenvincent · 2026-01-10T15:37:08Z

Note that #641 implements a number of the refactoring ideas here

drbenvincent · 2026-04-01T16:21:14Z

FYI: This PR will NOT be merged because this was experimental. But leaving it here in case there are refactor ideas I can use in new PR's.

drbenvincent · 2026-06-10T13:32:25Z

Refactor breakdown + status against `main`

Going back over this experiment now that it's served its purpose. Below I've categorised the ideas explored here into four types of refactor, recorded how much of each has already landed on main (this branch is from Sep 2025, main has moved a long way since), and flagged what's worth carrying forward as small focused PRs.

#	Refactor type	Idea in this PR	Status on `main`	Tracking
A	Method extraction from `__init__`	Pull the core algorithm into `algorithm()`, data prep into `_build_data()`, leave a minimal `__init__`	✅ Done	#507 (closed) via #641 (merged)
B	Consolidate scattered data attrs into `xarray.Dataset`	Bundle `pre_X`/`post_X`/`pre_y`/`post_y` etc. into one `self.data` Dataset	🔄 In progress	#199, #849, #848
C	Unified `period` dimension (drop pre/post split)	`result.impact.sel(period="post")` instead of `pre_impact`/`post_impact`	❌ Not adopted (main went the other way)	none
D	Plotting refactor	(a) separate compute from plot, (b) single `plot` for OLS+Bayes, (c) plot consumes a data object only	🔶 Partial / divergent	#886 (a/b context)

A — Method extraction ✅ Done

#641 (merged Jan 2026, closing #507) did exactly this across all experiment classes, not just InterruptedTimeSeries. Every class now has algorithm(), and the patsy-based classes have _build_design_matrices() + _prepare_data(). Note the naming diverged from this PR: main split data prep into _build_design_matrices() + _prepare_data() rather than a single _build_data() returning a Dataset. Nothing left to do here.

B — Consolidate into `xarray.Dataset` 🔄 In progress

This is the idea that's aged best. It's now tracked properly by #199 (the umbrella) with concrete implementation in #849 and a notebook-migration follow-up in #848. The landed design is more incremental and backward-compatible than the all-in-one self.data Dataset explored here: it introduces pre_design / post_design Datasets (ITS, SC) and a single design Dataset (formula-based classes), keeping the old pre_X/pre_y/… names alive via deprecated @property accessors. main still carries the sprawl in the meantime (ITS alone has ~110 self.pre_*/self.post_* references). No new issue needed — just land #849.

C — Unified `period` dimension ❌ Not adopted

Worth being honest that main moved in the opposite direction: instead of collapsing pre/post into a single period coordinate, it kept the split and added a third period (treatment_end_time → three-period design, _split_post_period(), intervention_* / post_intervention_* attributes). So result.impact.sel(period=...) is now a bigger, more contentious API change than it was here, and it interacts with the three-period work. My take: this is the one idea I would not carry forward as-is. If we want the ergonomic .sel(period=...) access it's better framed as a thin accessor on top of B's Datasets rather than a re-plumbing. Low priority; could be a discussion issue, not an implementation issue yet.

D — Plotting refactor 🔶 Partial / divergent

This is where the most concrete, low-risk follow-up issues live:

(a) Separate compute from plot — partially done, but with a duplication trap. get_plot_data_bayesian() / get_plot_data_ols() exist on main, but _bayesian_plot() / _ols_plot() do not consume them — they recompute predictions/impact/HDIs directly from self.pre_pred, self.post_impact, etc. So we now have two independent code paths computing the same plot quantities, which can silently drift. → Good focused issue: make the plot methods render from get_plot_data_*() output so there's a single source of truth.
(b) Single plot for OLS+Bayes — deliberately not adopted. main keeps _bayesian_plot / _ols_plot separate behind the _render_plot template method (per #886, which wanted explicit per-subclass plot() signatures). I'd leave the two-method split, but there's still real duplication between them (intervention lines, date-axis formatting, fill_between shading). → Good focused issue: extract the shared drawing bits into small helpers, keeping the OLS/Bayes split.
(c) Plot consumes a data object only. Not done here either. This is really the natural payoff of B: once predictions/impact live in a Dataset, the plot methods can take that object (with metadata, save/load) instead of reaching into self. → Best treated as a follow-up to Consolidate experiment design-matrix attributes into xr.Dataset #849, not its own thing yet.

Recommended focused PRs/issues to spin out

(D-a) Single source of truth for plotting: _bayesian_plot/_ols_plot consume get_plot_data_*() instead of recomputing. — small, high value, removes drift risk.
(D-b) De-duplicate shared drawing logic between _bayesian_plot and _ols_plot (intervention lines, date formatting, shading) into helpers, keeping the split intact. — small, cosmetic-but-real.
(B/D-c) After Consolidate experiment design-matrix attributes into xr.Dataset #849 lands, have plot methods operate on the consolidated *_design / results Datasets. — follow-up, fold into the Simplify ExperimentalDesign classes by consolidating relevant properties into a xarray.Dataset #199 line of work.

Everything else is either already done (A), already tracked (B), or intentionally superseded (C). Leaving this PR open as a reference per earlier note; not for merge.

drbenvincent added 10 commits August 29, 2025 18:14

refactor data storage in InterruptedTimeSeries

7eb7968

simplify _build_data

eebd947

extract interrupted time series algorithm from __init__ to new algori…

0e5b12c

…thm method

_data -> data

84a96ec

update uml images

04e14e9

add period coordinate + update get_plot_data_bayesian

37bee48

use set_xindex for cleaner code :)

5c849bb

remove some accessor methods

85b6939

simplify _build_data a bit + remove accessor methods

51e19f8

move import to top of module

762d485

drbenvincent added 2 commits September 15, 2025 10:52

refactor plotting

3a12ee0

updates to notebook required for some internal API changes

1cf6e1d

drbenvincent mentioned this pull request Oct 11, 2025

Enable plotting of model prior predictions #534

Open

drbenvincent added the refactor Refactor, clean up, or improvement with no visible changes to the user label Feb 28, 2026

drbenvincent mentioned this pull request May 25, 2026

Support dataframe-agnostic inputs with Narwhals #930

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploratory experiment class refactor, focussing on `InterruptedTimeSeries`#524

Exploratory experiment class refactor, focussing on `InterruptedTimeSeries`#524
drbenvincent wants to merge 12 commits into
mainfrom
data-refactor

drbenvincent commented Aug 30, 2025 •

edited

Loading

Uh oh!

codecov Bot commented Aug 30, 2025 •

edited

Loading

Uh oh!

review-notebook-app Bot commented Sep 15, 2025

Uh oh!

drbenvincent commented Jan 10, 2026

Uh oh!

drbenvincent commented Apr 1, 2026

Uh oh!

drbenvincent commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

drbenvincent commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

review-notebook-app Bot commented Sep 15, 2025

Uh oh!

drbenvincent commented Jan 10, 2026

Uh oh!

drbenvincent commented Apr 1, 2026

Uh oh!

drbenvincent commented Jun 10, 2026

Refactor breakdown + status against main

A — Method extraction ✅ Done

B — Consolidate into xarray.Dataset 🔄 In progress

C — Unified period dimension ❌ Not adopted

D — Plotting refactor 🔶 Partial / divergent

Recommended focused PRs/issues to spin out

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

drbenvincent commented Aug 30, 2025 •

edited

Loading

codecov Bot commented Aug 30, 2025 •

edited

Loading

Refactor breakdown + status against `main`

B — Consolidate into `xarray.Dataset` 🔄 In progress

C — Unified `period` dimension ❌ Not adopted