Skip to content

Support ordinal types #10

@anjsimmo

Description

@anjsimmo

The code for determining the similarity of two condition thresholds is shown below:

PERMISSIBLE_DELTA = 0.1
…
def condition_similarity(condition1: Condition, condition2: Condition):
    # Different attributes
    if condition1.attribute != condition2.attribute:
        return 0

    # Different operators
    # TODO: Extend???
    if condition1.operator != condition2.operator:
        return 0

    # Handle <= as a special case as per paper
    if condition1.operator == Operator.LE and condition2.operator == Operator.LE:
        t = abs(PERMISSIBLE_DELTA * condition1.threshold)
        x = abs(condition1.threshold - condition2.threshold)
        if x == 0:
            return 1
        return 1 - (x / t) if x < t else 0
    return 1

(The original code also contained a bug in the calculation of the tollerance, t, which was fixed in PR #6)

This threshold logic is not appropriate in case of ordinal numbers. For example, the UCI Poker Hand dataset represents the rank of cards as numbers between 1-13. As PERMISSIBLE_DELTA = 1.1, a Queen (12) is has a threshold, t, of 12 * 0.1 = 1.2, which means it would be considered similar to a Jack (11) or King (13), but an Ace (1) would have a threshold, t, of 1 * 0.1 = 0.1 so wouldn’t be considered similar to any other card.

The similar_tree module needs to be modified to allow a list of attributes to be treated as ordinal numbers, and tollerance threshold logic adjusted accordingly. The condition similarity should be 1 if the thresholds represent the same partitioning (e.g. <= 2.0 is the same as <= 2.9 as they both split {1, 2} vs {3, 4, ..}), and 0 otherwise.

Secondly, the code only deals with the case of two <= operators, not two > operators. In the case of two > operators it will return 1 (perfect similarity) even if the thresholds differ.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions