Three tests fail after the last PR #71. Two regarding l_inf.py:
test_check_l_inf_gap_calls_linprog_with_expected_shapes_and_returns_1
test_check_l_inf_gap_infeasible_returns_0
and one concerning lp_tools.py:
test_last_bin_not_sampled_doesnt_affect_result
The most likely explanation is that while developing PR #71, the tests were run having installed the package humancompatible-detectv0.1.5 instead of a developer install, therefore masking the fails.
However, this does not mean necessarilly that the code is not working well in its current state as can be seen from debugging the errors:
The first two errors share the same root cause. Before PR #71, the histograms binning was simpler than it is now, where it depends on the feature type (l_inf.py lines 78-93). The tests create a simplified version of the classes Binarizer, DataHandler and Feature, with a leading underscore, that do not take into account the feature type (for that, equivalent mock-classes would need to be defined for Categorical, Binary...). As the feature type is lacking, the bins remain an empty list, and a ValueError is raised. However, this does not imply that the test content is not fulfilled.
The last failed test changes at runtime (monkeypatches) the function random.randrange. The new code, however, does not use that function in the case in which the samples taken consist of all instances (lp_tools.py lines 51-53). This is precisely the case considered in the test, so the test fails. This could be fixed, but the test itself is not really informative: it considers a hypothetical case in which only one bin is compared against the tolerance big_delta, something that won't happen in practice. In my mind, the test can be discarded all together saving the time that a fix would require.
Three tests fail after the last PR #71. Two regarding l_inf.py:
test_check_l_inf_gap_calls_linprog_with_expected_shapes_and_returns_1
test_check_l_inf_gap_infeasible_returns_0
and one concerning lp_tools.py:
test_last_bin_not_sampled_doesnt_affect_result
The most likely explanation is that while developing PR #71, the tests were run having installed the package humancompatible-detectv0.1.5 instead of a developer install, therefore masking the fails.
However, this does not mean necessarilly that the code is not working well in its current state as can be seen from debugging the errors:
The first two errors share the same root cause. Before PR #71, the histograms binning was simpler than it is now, where it depends on the feature type (l_inf.py lines 78-93). The tests create a simplified version of the classes Binarizer, DataHandler and Feature, with a leading underscore, that do not take into account the feature type (for that, equivalent mock-classes would need to be defined for Categorical, Binary...). As the feature type is lacking, the bins remain an empty list, and a ValueError is raised. However, this does not imply that the test content is not fulfilled.
The last failed test changes at runtime (monkeypatches) the function random.randrange. The new code, however, does not use that function in the case in which the samples taken consist of all instances (lp_tools.py lines 51-53). This is precisely the case considered in the test, so the test fails. This could be fixed, but the test itself is not really informative: it considers a hypothetical case in which only one bin is compared against the tolerance big_delta, something that won't happen in practice. In my mind, the test can be discarded all together saving the time that a fix would require.