Fix get_decimal_precisions IndexError on scientific-notation modes (#442)#470
Open
jbbqqf wants to merge 1 commit into
Open
Fix get_decimal_precisions IndexError on scientific-notation modes (#442)#470jbbqqf wants to merge 1 commit into
jbbqqf wants to merge 1 commit into
Conversation
PublicData.get_decimal_precisions used `str(mode).split('.')[1]` to count
decimal digits. When the column's mode rendered in scientific notation —
e.g. `'1e-06'` for very small floats (mode <= 1e-5) or `'1e+16'` for very
large integer-valued floats — that string has no `'.'`, so `split('.')[1]`
IndexErrored and `dice_ml.Data(...)` blew up before the user could
generate counterfactuals (issue interpretml#442).
Extracts a `_decimal_precision_of` helper that:
- preserves the historical behaviour for plain decimal reprs (`'0.25'` → 2);
- handles scientific-notation reprs by combining the mantissa's decimal
count with the exponent (e.g. `'1e-06'` → 6, `'2.5e-3'` → 4, `'1e+16'` → 0).
Re-deriving the count from the scientific form rather than from a
re-formatted `%.20f` string avoids leaking float64 ↔ float32 round-off
junk digits into the precision result.
Adds parametrized tests covering small (1e-6), small-with-mantissa
(2.5e-3), large (1e16), and ordinary (0.5) values. The first and third
fail on `origin/main` with the exact IndexError from the bug report.
Closes interpretml#442.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PublicData.get_decimal_precisionsusesstr(mode).split('.')[1]to countdecimal digits on continuous columns. When the column's mode renders in
scientific notation — e.g.
'1e-06'for floats with magnitude <= 1e-5 or'1e+16'for very large integer-valued floats — the string has no'.',so
split('.')[1]IndexErrors and thedice_ml.Data(...)constructorblows up before the user can do anything. As @joeranbosma reports in #442,
this hits real datasets with currency/probability/nano-second magnitudes.
Why
Two changes:
Fix: extract
_decimal_precision_of(value)static helper thatpreserves the historical behaviour for plain decimal reprs (
'0.25'→2) and handles scientific-notation reprs by combining the mantissa's
decimal count with the exponent (
'1e-06'→ 6,'2.5e-3'→ 4,'1e+16'→ 0). Re-deriving from the scientific form rather than from are-formatted
%.20fstring avoids leaking float64 ↔ float32 round-offjunk digits into the result.
Regression test: parametrized
TestDecimalPrecisionsScientificNotationcovering small (1e-6), small-with-mantissa (2.5e-3), large (1e16), and
ordinary (0.5) modes. The first and third fail on
origin/mainwiththe exact
IndexError: list index out of rangefrom the bug report.Reproduce BEFORE/AFTER yourself (copy-paste)
What I ran locally
pytest tests/test_data_interface/test_public_data_interface.py→ 39 passed (4 new + 35 existing)origin/mainwithIndexErrorEdge cases
'0.25''1e-06''2.5e-3''1e+16''17'(int-valued float)[0.5, 0.25])AI disclosure
This change was prepared with the assistance of Claude (Anthropic).
The author reviewed every line and is responsible for the final result.