Skip to content

Feature Selection: manual grid-search uses stale 'filtered_df' for X instead of the freshly built df #22

Description

@jonfroehlich

Found in the #16 correctness/prose/comments audit (verified).

In Projects/GestureRecognizer/Feature Selection and Hyperparameter Tuning.ipynb, the manual grid-search cell builds a fresh feature frame for the "Jon" gesture set, then sets X from a stale variable instead of that frame:

(feature_vectors, feature_names) = extract_features_from_gesture_set(selected_gesture_set)
df = pd.DataFrame(feature_vectors, columns = feature_names)
df_trial_indices = df.pop('trial_num')
df_gesturer      = df.pop('gesturer')
df_gesture       = df.pop('gesture')   # ground-truth labels

X = filtered_df      # <-- BUG: stale frame from an earlier cell, not derived from this df
y = df_gesture
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, stratify=y, random_state=3)

filtered_df is whatever an earlier cell last assigned (a different pipeline, possibly with added dummy columns), so X and y can come from different feature sets and the result is order-dependent. It almost certainly should be X = df (or a freshly filtered version of df).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions