Skip to content

v1 ACCEPTED (audio-only acoustic): acceptance run + chord -> v1.1#14

Merged
pgil256 merged 2 commits into
mainfrom
eval/v1-acceptance
Jun 8, 2026
Merged

v1 ACCEPTED (audio-only acoustic): acceptance run + chord -> v1.1#14
pgil256 merged 2 commits into
mainfrom
eval/v1-acceptance

Conversation

@pgil256

@pgil256 pgil256 commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Stamps v1 ACCEPTED on the audio-only acoustic scope, from the formal all-metrics acceptance run.

Acceptance — SPEC §1.4.1 (GuitarSet held-out player-05, 60 clips)

Gate single-line strummed aggregate result
Tab F1 (lower-95) 0.457 0.606 0.600 ✅ pass
Onset F1 (mean) 0.938 0.923 ✅ pass
Pitch F1 (mean) 0.930 0.901 ✅ pass
Latency ~45 s / 60 s clip ✅ pass

Report: docs/EVAL_REPORTS/v1_acceptance_2026-06-03.md. Eval harness 292252d, highres backend + leak-free guitarset-v1 prior. Acceptance test = lower_95_CI ≥ target.

Chord-instance accuracy → v1.1 (video)

Measured 0.52 single-line / 0.48 strummed. Whole-chord recovery needs the exact string + fret for every note in a chord, so it carries the same audio string-resolution limit that already lowered single-line Tab F1 0.94 → 0.45 (single-line chord 0.52 ≈ single-line Tab F1 0.52). Re-scoped to a v1.1 video-assisted target (user-approved); v1 records the audio-only baseline.

What's in the diff

  • 292252d — harness: adds chord-instance accuracy (per-clip + per-tier bootstrap CI + report section), corrects DEFAULT_TIER_TARGETS to the §1.4.1 gates (the old 0.85/0.90 predated the 2026-06-02 acoustic-scope amendment), and reuses one audio backend across clips — it was rebuilt per clip, reloading the highres checkpoint every clip (~10× slower, and the accumulation OOM'd a 60-clip run around clip 17).
  • d403713 — SPEC §1.4.1 amendment (record the run + re-scope chord), the acceptance report + verdict header, the DECISIONS entry, and the harness chord note.

Verification

ruff check + ruff format --check clean, mypy tabvision clean (56 files), eval unit + formatter tests pass. The run also needed KMP_DUPLICATE_LIB_OK=TRUE on Windows to dodge a duplicate-OpenMP (libiomp5md.dll) segfault — environment-only, not a code change.

🤖 Generated with Claude Code

pgil256 and others added 2 commits June 3, 2026 09:52
…1 targets

- Add chord-instance accuracy per clip + per-tier bootstrap CI + a report
  section, so the composite eval reports all SPEC §1.4 metrics together.
- Correct DEFAULT_TIER_TARGETS to the §1.4.1 honest audio-only acoustic gates
  (single-line 0.45, strummed 0.60). The prior 0.85/0.90 predated the
  2026-06-02 acoustic-scope amendment and mislabeled passing tiers fail.
- Build the audio backend once and reuse it across clips. It was rebuilt per
  clip, reloading the highres checkpoint every clip: ~10x slower, and the
  accumulation exhausted memory partway through a 60-clip run.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Formal acceptance run over GuitarSet player-05 validation (60 clips, harness
292252d, --position-prior guitarset-v1) clears every SPEC §1.4.1 gate:
- Tab F1 lower-95: single-line 0.457 (>=0.45), strummed 0.606 (>=0.60),
  aggregate 0.600 (>=0.55)
- Onset 0.94/0.92 (>=0.92), Pitch 0.93/0.90 (>=0.90)
- Latency ~45 s for a 60 s clip (0.74x realtime, <=5 min)

Chord-instance accuracy (0.52/0.48) is re-scoped to a v1.1 video target: it
shares single-line Tab F1's audio string/fret information limit (the same limit
that lowered single-line 0.94 -> 0.45). User-approved.

- SPEC §1.4.1: record the acceptance run + re-scope chord to v1.1.
- docs/EVAL_REPORTS/v1_acceptance_2026-06-03.md: report + verdict header.
- docs/DECISIONS.md: v1-accepted decision entry.
- composite.py: chord report note now states the v1.1 framing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 8, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
tab_vision Ready Ready Preview, Comment Jun 8, 2026 5:33pm

@pgil256 pgil256 merged commit 0601ba9 into main Jun 8, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant