Skip to content

Add validation to _data_setter for non-standard data node names#847

Open
drbenvincent wants to merge 2 commits into
mainfrom
issue-663-data-setter-customization
Open

Add validation to _data_setter for non-standard data node names#847
drbenvincent wants to merge 2 commits into
mainfrom
issue-663-data-setter-customization

Conversation

@drbenvincent

@drbenvincent drbenvincent commented Apr 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

PyMCModel._data_setter() hard-codes "X" and "y" as data node names. If a subclass uses different names, it silently misbehaves or throws a cryptic PyMC error. This PR adds validation that raises a clear ValueError with actionable guidance when expected data nodes are missing.

Fixes #663

Changes

  • causalpy/pymc_models.py:

    • Added validation at the top of _data_setter() that checks "X" and "y" exist in self.named_vars before proceeding
    • Raises ValueError with guidance to override _data_setter() when nodes are missing
    • Updated class and method docstrings to document the override contract
    • No behavioral changes to existing logic — the pm.set_data call is unchanged
  • causalpy/experiments/panel_regression.py:

    • Fixed pre-existing mypy type: ignore codes (attr-definedunion-attr)
  • causalpy/tests/test_pymc_models.py:

    • Added NonStandardDataModel test subclass using renamed data nodes
    • Added TestDataSetterValidation with 2 tests:
      • test_mismatched_X_raises: verifies clear error when "X" is renamed
      • test_missing_y_raises: verifies clear error when only "y" is renamed

Design rationale

The issue proposed 5 possible directions. After examining all existing subclasses, I found that:

  • All standard models (LinearRegression, WeightedSumFitter, SoftmaxWeightedSumFitter) use "X" and "y" — they work as-is
  • Complex models (BayesianBasisExpansionTimeSeries) already override _data_setter entirely
  • Models with fundamentally different data (StateSpaceTimeSeries, InstrumentalVariableRegression, PropensityScore) override predict or fit entirely

So the override contract already works for all existing subclasses. The missing piece was validation: a subclass that forgets to override _data_setter previously got a cryptic error. Now it gets a clear ValueError pointing to the solution.

I deliberately kept this minimal — no new class attributes or API surface — because the name-remapping-only case doesn't arise in any existing subclass and would add complexity for a speculative benefit.

Testing

  • 2 new validation tests pass
  • All 49 existing tests in test_pymc_models.py pass
  • All 30 PyMC integration tests pass
  • All 21 timeseries model coverage tests pass
  • prek run --all-files clean

Checklist

  • Backwards compatible (no behavioral changes)
  • No new API surface
  • Tests cover both X-missing and y-missing error paths
  • Docstrings updated

Add _predictor_data_name and _target_data_name class attributes to
PyMCModel so subclasses using non-default data node names can
customize without re-implementing _data_setter. Validate at predict
time and raise a clear ValueError if expected nodes are missing.

Also fix pre-existing mypy type: ignore codes in panel_regression.py
(attr-defined -> union-attr).

Made-with: Cursor
@github-actions github-actions Bot added the refactor Refactor, clean up, or improvement with no visible changes to the user label Apr 15, 2026
@read-the-docs-community

read-the-docs-community Bot commented Apr 15, 2026

Copy link
Copy Markdown

@codecov

codecov Bot commented Apr 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.87755% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.44%. Comparing base (1cf3387) to head (08ee2c3).

Files with missing lines Patch % Lines
causalpy/tests/test_pymc_models.py 93.18% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #847      +/-   ##
==========================================
- Coverage   94.44%   94.44%   -0.01%     
==========================================
  Files          80       80              
  Lines       12707    12754      +47     
  Branches      770      773       +3     
==========================================
+ Hits        12001    12045      +44     
- Misses        498      500       +2     
- Partials      208      209       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Remove _predictor_data_name / _target_data_name class attributes
(added complexity for a case no existing subclass needs). Keep the
validation that raises a clear ValueError when expected data nodes
are missing. Revert undeclared y_dtype behavioral change. Improve
test coverage with separate X-missing and y-missing error paths.

Made-with: Cursor
@drbenvincent drbenvincent changed the title Make _data_setter customizable via class attributes Add validation to _data_setter for non-standard data node names Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactor Refactor, clean up, or improvement with no visible changes to the user

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PyMCModel._data_setter hard-codes X/y data nodes

1 participant