FeatureBased: train/test leakage in per-user leave-one-out cross-validation

**Found in the #16 correctness/prose/comments audit (verified by reading the cells + tracing pandas semantics).**

In `Projects/GestureRecognizer/GestureRecognizer-FeatureBased.ipynb`, the per-user "leave-one-trial-out" cross-validation (the cell that selects a test gesturer, and the loop-over-all-gesturers cell right after it) leaks the held-out test rows into the training set, so the reported cross-user accuracies are inflated.

```python
just_test_gesturer = df.loc[df['gesturer'] == "JonGestures"]   # keeps df's original index labels (e.g. 550..599)
...
for train_index, test_index in skf.split(just_test_gesturer, just_test_gesturer_y_true):
    df_training = df.drop(test_index)              # <-- BUG
    ...
    X_test = just_test_gesturer.iloc[test_index]   # positional (correct)
```

`test_index` from `StratifiedKFold.split` is **positional** into `just_test_gesturer` (0..N-1). But `df.drop(test_index)` drops by **index label**, and `df` has a default RangeIndex, so it removes rows whose *labels* are `0,1,5,...` — i.e. rows belonging to **whichever gesturer sits at those positions**, not the held-out test rows. Jon's actual test rows (labels ~550-599) are therefore **kept in `df_training`**, while some other gesturer's rows are dropped. Result: the model trains on the very rows it is scored against.

### Fix
Drop by the test rows' real labels, e.g.:
```python
test_labels = just_test_gesturer.index[test_index]
df_training = df.drop(test_labels)
```
(and update the comment "everything but the test indices for this fold", which is currently false).

While here: these CV cells use `StratifiedKFold(..., shuffle=True, random_state=None)`, so results change every run — consider seeding for a reproducible narrative.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FeatureBased: train/test leakage in per-user leave-one-out cross-validation #17

Fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

FeatureBased: train/test leakage in per-user leave-one-out cross-validation #17

Description

Fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions