Fix get_decimal_precisions IndexError on scientific-notation modes (#442) by jbbqqf · Pull Request #470 · interpretml/DiCE

jbbqqf · 2026-05-09T22:49:41Z

Summary

PublicData.get_decimal_precisions uses str(mode).split('.')[1] to count
decimal digits on continuous columns. When the column's mode renders in
scientific notation — e.g. '1e-06' for floats with magnitude <= 1e-5 or
'1e+16' for very large integer-valued floats — the string has no '.',
so split('.')[1] IndexErrors and the dice_ml.Data(...) constructor
blows up before the user can do anything. As @joeranbosma reports in #442,
this hits real datasets with currency/probability/nano-second magnitudes.

Why

Two changes:

Fix: extract _decimal_precision_of(value) static helper that
preserves the historical behaviour for plain decimal reprs ('0.25' →
2) and handles scientific-notation reprs by combining the mantissa's
decimal count with the exponent ('1e-06' → 6, '2.5e-3' → 4,
'1e+16' → 0). Re-deriving from the scientific form rather than from a
re-formatted %.20f string avoids leaking float64 ↔ float32 round-off
junk digits into the result.
Regression test: parametrized TestDecimalPrecisionsScientificNotation
covering small (1e-6), small-with-mantissa (2.5e-3), large (1e16), and
ordinary (0.5) modes. The first and third fail on origin/main with
the exact IndexError: list index out of range from the bug report.

Reproduce BEFORE/AFTER yourself (copy-paste)

set -e
cd /tmp && rm -rf DiCE-442 && git clone https://github.com/interpretml/DiCE.git DiCE-442
cd DiCE-442
pip install -q -e . pytest

git fetch https://github.com/jbbqqf/DiCE.git fix/442-decimal-precisions-scientific
git checkout FETCH_HEAD -- tests/test_data_interface/test_public_data_interface.py

# --- BEFORE: fix not applied (production code from origin/main) ---
git checkout origin/main -- dice_ml/data_interfaces/public_data_interface.py
python -m pytest tests/test_data_interface/test_public_data_interface.py::TestDecimalPrecisionsScientificNotation -q || echo \"BEFORE: 2 failed (IndexError) — expected\"
# Expected: 2 failed, 2 passed (IndexError on values0-6 and values2-0)

# --- AFTER: fix applied ---
git checkout FETCH_HEAD -- dice_ml/data_interfaces/public_data_interface.py
python -m pytest tests/test_data_interface/test_public_data_interface.py::TestDecimalPrecisionsScientificNotation -q
# Expected: 4 passed

What I ran locally

pytest tests/test_data_interface/test_public_data_interface.py → 39 passed (4 new + 35 existing)
Full test_public_data_interface suite still green
Confirmed 2/4 of the new tests fail on origin/main with IndexError

Edge cases

input mode repr	precision returned	rationale
`'0.25'`	2	unchanged from historical behaviour
`'1e-06'`	6	mantissa has no decimals; exponent -6 ⇒ 6
`'2.5e-3'`	4	mantissa has 1 decimal; exponent -3 ⇒ 1 + 3 = 4
`'1e+16'`	0	positive exponent collapses fractional digits to 0
`'17'` (int-valued float)	0	new explicit branch (was implicit before)
Mixed-precision modes (e.g. `[0.5, 0.25]`)	max across modes	unchanged

AI disclosure

This change was prepared with the assistance of Claude (Anthropic).
The author reviewed every line and is responsible for the final result.

PublicData.get_decimal_precisions used `str(mode).split('.')[1]` to count decimal digits. When the column's mode rendered in scientific notation — e.g. `'1e-06'` for very small floats (mode <= 1e-5) or `'1e+16'` for very large integer-valued floats — that string has no `'.'`, so `split('.')[1]` IndexErrored and `dice_ml.Data(...)` blew up before the user could generate counterfactuals (issue interpretml#442). Extracts a `_decimal_precision_of` helper that: - preserves the historical behaviour for plain decimal reprs (`'0.25'` → 2); - handles scientific-notation reprs by combining the mantissa's decimal count with the exponent (e.g. `'1e-06'` → 6, `'2.5e-3'` → 4, `'1e+16'` → 0). Re-deriving the count from the scientific form rather than from a re-formatted `%.20f` string avoids leaking float64 ↔ float32 round-off junk digits into the precision result. Adds parametrized tests covering small (1e-6), small-with-mantissa (2.5e-3), large (1e16), and ordinary (0.5) values. The first and third fail on `origin/main` with the exact IndexError from the bug report. Closes interpretml#442. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

jbbqqf requested review from amit-sharma and gaugup as code owners May 9, 2026 22:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix get_decimal_precisions IndexError on scientific-notation modes (#442)#470

Fix get_decimal_precisions IndexError on scientific-notation modes (#442)#470
jbbqqf wants to merge 1 commit into
interpretml:mainfrom
jbbqqf:fix/442-decimal-precisions-scientific

jbbqqf commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

jbbqqf commented May 9, 2026

Summary

Why

Reproduce BEFORE/AFTER yourself (copy-paste)

What I ran locally

Edge cases

AI disclosure

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant