Discretization algorithm based on the paper by Usama M. Fayyad and Keki B. Irani
Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI-95), pages 1022-1027, Montreal, Canada, August 1995.
pip install FImdlpThe project no longer relies on a git submodule — the C++ sources live under src/cpp/. A regular clone is enough:
git clone https://github.com/doctorado-ml/FImdlp.git
cd FImdlp
make deps # install the [dev] extras (build, twine, pip-audit, black, flake8, coverage)
make install # editable install; compiles the Cython/C++ extension in placeRun make help to list every available target.
from sklearn.datasets import load_iris
from fimdlp.mdlp import FImdlp
X, y = load_iris(return_X_y=True)
clf = FImdlp().fit(X, y)
# Discretize
X_disc = clf.transform(X)
# Inspect cut points: [vmin, c1, ..., cn, vmax] per feature
for f, cuts in enumerate(clf.get_cut_points()):
print(f"feature {f}: {cuts}")Constructor parameters:
| Parameter | Default | Description |
|---|---|---|
n_jobs |
-1 |
Threads for per-feature fit/transform. -1 uses all cores. |
min_length |
3 |
Minimum samples in an interval to consider further splits. |
max_depth |
1e6 |
Maximum recursion depth of the splitting procedure. |
max_cuts |
0 |
Cap on intermediate cut points per feature (0 = unlimited; <1 is interpreted as a fraction of samples). |
| Target | What it does |
|---|---|
make help |
List every target. |
make deps |
Install the [dev] extras (build, twine, pip-audit, black, flake8, coverage). |
make install |
Editable install (pip install -e .); rebuilds the C++/Cython extension. |
make test |
Run unit tests with coverage. Rebuilds the extension if the .so is missing. |
make coverage |
Run tests then print the coverage report. |
make lint |
Format with black and lint with flake8. |
make build |
Produce wheel + sdist in dist/. |
make publish |
make build + twine check + twine upload. |
make audit |
Run pip-audit on the installed packages. |
make sample_py |
Run the Python sample on the iris dataset. |
make sample_cpp |
Build and run the C++ sample on the iris dataset. |
make version |
Show Python, FImdlp and bundled mdlp versions. |
make clean |
Remove build artifacts, caches and the compiled extension. |
make sample_py
# equivalent to:
# cd samples && python sample.py irisOther options:
python samples/sample.py iris # default settings
python samples/sample.py iris -c 2 # cap intermediate cut points to 2
python samples/sample.py iris -m 3 # cap recursion depth to 3
python samples/sample.py iris -n 25 # set min_length to 25
python samples/sample.py -h # full option listmake sample_cpp
# equivalent to:
# cd samples && cmake -B build -S . && cmake --build build && cd build && ./sample -f irisOther options:
cd samples/build
./sample -f iris -c 2 # cap intermediate cut points to 2
./sample -f glass -m 3 # change dataset and depth
./sample -h # full option list