feat(v1.1): oracle-validate string resolver - single-line 0.57 -> 0.995#17
Merged
Conversation
… 0.995 The fretting-hand string resolver is ALREADY built and wired (fuse -> find_fingering_at -> emission_cost vision term); v1's audio-only run just never fed it fingerings. Probe it with a gold-derived oracle FrameFingering (perfect hand signal) - pure fusion over GuitarSet gold, no model/video/rendering: tier audio +oracle delta single-line 0.568 0.995 +0.427 (> 0.94 v1.1 target) strummed 0.747 0.978 +0.231 (> 0.85) aggregate 0.657 0.986 +0.329 So v1.1 P1 (resolver) is effectively done; the milestone reduces to P0 (eval data). Corrects the design doc s4 gap analysis (it described the fret-only neck-anchor path, not the FrameFingering path). - scripts/eval/v1_1_oracle_string_probe.py: the probe (reproducible). - tests/unit/test_video_string_resolution.py: oracle resolves the ambiguous string; absent fingerings == audio-only decode (no-regression). - docs/EVAL_REPORTS/v1_1_oracle_string_probe_2026-06-03.md: report. - design doc s4 + DECISIONS: corrected/recorded. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
…le 0.42 -> 1.00) Real-video eval, step 1 of 3: parse the Kaggle UT-Austin per-frame finger labels into per-note gold TabEvents and reproduce the oracle string-resolution lift on these REAL clips (gold -> oracle FrameFingering -> fuse), like the GuitarSet probe. Gold derivation: label[frame][finger]=[active,fret,their_string]; a NEW (fret,string) placement vs the previous frame = a note onset; only the highest fret on a string sounds (collapse simultaneous same-string finger rests); our_idx=6-their_string (audio-verified); onsets via timestamps.csv. Result over the 25 tablature clips (527 notes): audio-only Tab F1 0.42 -> +oracle 1.00 (every clip 1.0). Data pipeline + gold derivation locked; the remaining unknown is the MediaPipe CV chain (chunk 2). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…k-1 status - docs/EVAL_REPORTS/v1_1_dataset_search_2026-06-03.md: deep-research dataset landscape (no portfolio-clean public video+string dataset exists), the Kaggle UT-Austin decision, the s1.5 eval-vs-shipping license reasoning, and the chunk-1 result (real-video oracle 0.42 -> 1.00). - v1.1 design doc s6/s7/s10: record the resolution, mark P1 + chunk-1 done, lay out chunks 2-3, and correct the s1.5 reading. - DECISIONS: v1.1 eval-dataset + chunk-1 entry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Validates the v1.1 lever before building anything: does the existing fusion resolve the string when given a fretting-hand signal?
Result (oracle probe, GuitarSet player-05 validation, 60 clips)
Perfect pitch+onset from gold + an oracle FrameFingering peaked on the true (string, fret); pure fusion, no model/video/rendering:
Single-line 0.995 is past the 0.94 v1.1 target; strummed 0.978 past 0.85.
What this means
Contents
ruff + the new test green locally.
Generated with Claude Code