Skip to content

feat(v1.1): oracle-validate string resolver - single-line 0.57 -> 0.995#17

Merged
pgil256 merged 3 commits into
mainfrom
v1.1/oracle-string-resolution
Jun 10, 2026
Merged

feat(v1.1): oracle-validate string resolver - single-line 0.57 -> 0.995#17
pgil256 merged 3 commits into
mainfrom
v1.1/oracle-string-resolution

Conversation

@pgil256

@pgil256 pgil256 commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Validates the v1.1 lever before building anything: does the existing fusion resolve the string when given a fretting-hand signal?

Result (oracle probe, GuitarSet player-05 validation, 60 clips)

Perfect pitch+onset from gold + an oracle FrameFingering peaked on the true (string, fret); pure fusion, no model/video/rendering:

tier audio +oracle delta
single-line 0.568 0.995 +0.427
strummed 0.747 0.978 +0.231
aggregate 0.657 0.986 +0.329

Single-line 0.995 is past the 0.94 v1.1 target; strummed 0.978 past 0.85.

What this means

  • The string resolver is ALREADY built and wired: fuse -> find_fingering_at(onset) -> emission_cost vision term (lambda_vision * -log marginal_string_fret[s,f]), candidate-restricted by Viterbi. v1 never fed it fingerings (GuitarSet is audio-only).
  • The design doc section-4 gap analysis was wrong (it described the fret-only neck-anchor path); corrected here.
  • v1.1 P1 (resolver) is effectively DONE. The milestone reduces to P0: an eval corpus with fretting-hand video + string labels (synthetic-from-GuitarSet to prove on clean video, then a license-clean public video+string dataset as the gate).

Contents

  • scripts/eval/v1_1_oracle_string_probe.py - reproducible probe.
  • tests/unit/test_video_string_resolution.py - oracle resolves the ambiguous string; absent fingerings == audio-only decode (no-regression). 2 passed.
  • docs/EVAL_REPORTS/v1_1_oracle_string_probe_2026-06-03.md - report.
  • design doc section-4 + DECISIONS - corrected/recorded.

ruff + the new test green locally.

Generated with Claude Code

… 0.995

The fretting-hand string resolver is ALREADY built and wired (fuse ->
find_fingering_at -> emission_cost vision term); v1's audio-only run just never
fed it fingerings. Probe it with a gold-derived oracle FrameFingering (perfect
hand signal) - pure fusion over GuitarSet gold, no model/video/rendering:

  tier          audio   +oracle    delta
  single-line   0.568   0.995    +0.427   (> 0.94 v1.1 target)
  strummed      0.747   0.978    +0.231   (> 0.85)
  aggregate     0.657   0.986    +0.329

So v1.1 P1 (resolver) is effectively done; the milestone reduces to P0 (eval
data). Corrects the design doc s4 gap analysis (it described the fret-only
neck-anchor path, not the FrameFingering path).

- scripts/eval/v1_1_oracle_string_probe.py: the probe (reproducible).
- tests/unit/test_video_string_resolution.py: oracle resolves the ambiguous
  string; absent fingerings == audio-only decode (no-regression).
- docs/EVAL_REPORTS/v1_1_oracle_string_probe_2026-06-03.md: report.
- design doc s4 + DECISIONS: corrected/recorded.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 9, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
tab_vision Ready Ready Preview, Comment Jun 10, 2026 8:37pm

…le 0.42 -> 1.00)

Real-video eval, step 1 of 3: parse the Kaggle UT-Austin per-frame finger labels
into per-note gold TabEvents and reproduce the oracle string-resolution lift on
these REAL clips (gold -> oracle FrameFingering -> fuse), like the GuitarSet probe.

Gold derivation: label[frame][finger]=[active,fret,their_string]; a NEW
(fret,string) placement vs the previous frame = a note onset; only the highest
fret on a string sounds (collapse simultaneous same-string finger rests);
our_idx=6-their_string (audio-verified); onsets via timestamps.csv.

Result over the 25 tablature clips (527 notes): audio-only Tab F1 0.42 -> +oracle
1.00 (every clip 1.0). Data pipeline + gold derivation locked; the remaining
unknown is the MediaPipe CV chain (chunk 2).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…k-1 status

- docs/EVAL_REPORTS/v1_1_dataset_search_2026-06-03.md: deep-research dataset
  landscape (no portfolio-clean public video+string dataset exists), the Kaggle
  UT-Austin decision, the s1.5 eval-vs-shipping license reasoning, and the
  chunk-1 result (real-video oracle 0.42 -> 1.00).
- v1.1 design doc s6/s7/s10: record the resolution, mark P1 + chunk-1 done, lay
  out chunks 2-3, and correct the s1.5 reading.
- DECISIONS: v1.1 eval-dataset + chunk-1 entry.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@pgil256 pgil256 merged commit fcf5dbf into main Jun 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant