Found in the #16 correctness/prose/comments audit (verified by reading the cells + tracing pandas semantics).
In Projects/GestureRecognizer/GestureRecognizer-FeatureBased.ipynb, the per-user "leave-one-trial-out" cross-validation (the cell that selects a test gesturer, and the loop-over-all-gesturers cell right after it) leaks the held-out test rows into the training set, so the reported cross-user accuracies are inflated.
just_test_gesturer = df.loc[df['gesturer'] == "JonGestures"] # keeps df's original index labels (e.g. 550..599)
...
for train_index, test_index in skf.split(just_test_gesturer, just_test_gesturer_y_true):
df_training = df.drop(test_index) # <-- BUG
...
X_test = just_test_gesturer.iloc[test_index] # positional (correct)
test_index from StratifiedKFold.split is positional into just_test_gesturer (0..N-1). But df.drop(test_index) drops by index label, and df has a default RangeIndex, so it removes rows whose labels are 0,1,5,... — i.e. rows belonging to whichever gesturer sits at those positions, not the held-out test rows. Jon's actual test rows (labels ~550-599) are therefore kept in df_training, while some other gesturer's rows are dropped. Result: the model trains on the very rows it is scored against.
Fix
Drop by the test rows' real labels, e.g.:
test_labels = just_test_gesturer.index[test_index]
df_training = df.drop(test_labels)
(and update the comment "everything but the test indices for this fold", which is currently false).
While here: these CV cells use StratifiedKFold(..., shuffle=True, random_state=None), so results change every run — consider seeding for a reproducible narrative.
Found in the #16 correctness/prose/comments audit (verified by reading the cells + tracing pandas semantics).
In
Projects/GestureRecognizer/GestureRecognizer-FeatureBased.ipynb, the per-user "leave-one-trial-out" cross-validation (the cell that selects a test gesturer, and the loop-over-all-gesturers cell right after it) leaks the held-out test rows into the training set, so the reported cross-user accuracies are inflated.test_indexfromStratifiedKFold.splitis positional intojust_test_gesturer(0..N-1). Butdf.drop(test_index)drops by index label, anddfhas a default RangeIndex, so it removes rows whose labels are0,1,5,...— i.e. rows belonging to whichever gesturer sits at those positions, not the held-out test rows. Jon's actual test rows (labels ~550-599) are therefore kept indf_training, while some other gesturer's rows are dropped. Result: the model trains on the very rows it is scored against.Fix
Drop by the test rows' real labels, e.g.:
(and update the comment "everything but the test indices for this fold", which is currently false).
While here: these CV cells use
StratifiedKFold(..., shuffle=True, random_state=None), so results change every run — consider seeding for a reproducible narrative.