Skip to content

Fix get_decimal_precisions IndexError on scientific-notation modes (#442)#470

Open
jbbqqf wants to merge 1 commit into
interpretml:mainfrom
jbbqqf:fix/442-decimal-precisions-scientific
Open

Fix get_decimal_precisions IndexError on scientific-notation modes (#442)#470
jbbqqf wants to merge 1 commit into
interpretml:mainfrom
jbbqqf:fix/442-decimal-precisions-scientific

Conversation

@jbbqqf
Copy link
Copy Markdown

@jbbqqf jbbqqf commented May 9, 2026

Summary

PublicData.get_decimal_precisions uses str(mode).split('.')[1] to count
decimal digits on continuous columns. When the column's mode renders in
scientific notation — e.g. '1e-06' for floats with magnitude <= 1e-5 or
'1e+16' for very large integer-valued floats — the string has no '.',
so split('.')[1] IndexErrors and the dice_ml.Data(...) constructor
blows up before the user can do anything. As @joeranbosma reports in #442,
this hits real datasets with currency/probability/nano-second magnitudes.

Why

Two changes:

  1. Fix: extract _decimal_precision_of(value) static helper that
    preserves the historical behaviour for plain decimal reprs ('0.25'
    2) and handles scientific-notation reprs by combining the mantissa's
    decimal count with the exponent ('1e-06' → 6, '2.5e-3' → 4,
    '1e+16' → 0). Re-deriving from the scientific form rather than from a
    re-formatted %.20f string avoids leaking float64 ↔ float32 round-off
    junk digits into the result.

  2. Regression test: parametrized TestDecimalPrecisionsScientificNotation
    covering small (1e-6), small-with-mantissa (2.5e-3), large (1e16), and
    ordinary (0.5) modes. The first and third fail on origin/main with
    the exact IndexError: list index out of range from the bug report.

Reproduce BEFORE/AFTER yourself (copy-paste)

set -e
cd /tmp && rm -rf DiCE-442 && git clone https://github.com/interpretml/DiCE.git DiCE-442
cd DiCE-442
pip install -q -e . pytest

git fetch https://github.com/jbbqqf/DiCE.git fix/442-decimal-precisions-scientific
git checkout FETCH_HEAD -- tests/test_data_interface/test_public_data_interface.py

# --- BEFORE: fix not applied (production code from origin/main) ---
git checkout origin/main -- dice_ml/data_interfaces/public_data_interface.py
python -m pytest tests/test_data_interface/test_public_data_interface.py::TestDecimalPrecisionsScientificNotation -q || echo \"BEFORE: 2 failed (IndexError) — expected\"
# Expected: 2 failed, 2 passed (IndexError on values0-6 and values2-0)

# --- AFTER: fix applied ---
git checkout FETCH_HEAD -- dice_ml/data_interfaces/public_data_interface.py
python -m pytest tests/test_data_interface/test_public_data_interface.py::TestDecimalPrecisionsScientificNotation -q
# Expected: 4 passed

What I ran locally

  • pytest tests/test_data_interface/test_public_data_interface.py39 passed (4 new + 35 existing)
  • Full test_public_data_interface suite still green
  • Confirmed 2/4 of the new tests fail on origin/main with IndexError

Edge cases

input mode repr precision returned rationale
'0.25' 2 unchanged from historical behaviour
'1e-06' 6 mantissa has no decimals; exponent -6 ⇒ 6
'2.5e-3' 4 mantissa has 1 decimal; exponent -3 ⇒ 1 + 3 = 4
'1e+16' 0 positive exponent collapses fractional digits to 0
'17' (int-valued float) 0 new explicit branch (was implicit before)
Mixed-precision modes (e.g. [0.5, 0.25]) max across modes unchanged

AI disclosure

This change was prepared with the assistance of Claude (Anthropic).
The author reviewed every line and is responsible for the final result.

PublicData.get_decimal_precisions used `str(mode).split('.')[1]` to count
decimal digits. When the column's mode rendered in scientific notation —
e.g. `'1e-06'` for very small floats (mode <= 1e-5) or `'1e+16'` for very
large integer-valued floats — that string has no `'.'`, so `split('.')[1]`
IndexErrored and `dice_ml.Data(...)` blew up before the user could
generate counterfactuals (issue interpretml#442).

Extracts a `_decimal_precision_of` helper that:
- preserves the historical behaviour for plain decimal reprs (`'0.25'` → 2);
- handles scientific-notation reprs by combining the mantissa's decimal
  count with the exponent (e.g. `'1e-06'` → 6, `'2.5e-3'` → 4, `'1e+16'` → 0).

Re-deriving the count from the scientific form rather than from a
re-formatted `%.20f` string avoids leaking float64 ↔ float32 round-off
junk digits into the precision result.

Adds parametrized tests covering small (1e-6), small-with-mantissa
(2.5e-3), large (1e16), and ordinary (0.5) values. The first and third
fail on `origin/main` with the exact IndexError from the bug report.

Closes interpretml#442.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jbbqqf jbbqqf requested review from amit-sharma and gaugup as code owners May 9, 2026 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant