v1 ACCEPTED (audio-only acoustic): acceptance run + chord -> v1.1#14
Merged
Conversation
…1 targets - Add chord-instance accuracy per clip + per-tier bootstrap CI + a report section, so the composite eval reports all SPEC §1.4 metrics together. - Correct DEFAULT_TIER_TARGETS to the §1.4.1 honest audio-only acoustic gates (single-line 0.45, strummed 0.60). The prior 0.85/0.90 predated the 2026-06-02 acoustic-scope amendment and mislabeled passing tiers fail. - Build the audio backend once and reuse it across clips. It was rebuilt per clip, reloading the highres checkpoint every clip: ~10x slower, and the accumulation exhausted memory partway through a 60-clip run. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Formal acceptance run over GuitarSet player-05 validation (60 clips, harness 292252d, --position-prior guitarset-v1) clears every SPEC §1.4.1 gate: - Tab F1 lower-95: single-line 0.457 (>=0.45), strummed 0.606 (>=0.60), aggregate 0.600 (>=0.55) - Onset 0.94/0.92 (>=0.92), Pitch 0.93/0.90 (>=0.90) - Latency ~45 s for a 60 s clip (0.74x realtime, <=5 min) Chord-instance accuracy (0.52/0.48) is re-scoped to a v1.1 video target: it shares single-line Tab F1's audio string/fret information limit (the same limit that lowered single-line 0.94 -> 0.45). User-approved. - SPEC §1.4.1: record the acceptance run + re-scope chord to v1.1. - docs/EVAL_REPORTS/v1_acceptance_2026-06-03.md: report + verdict header. - docs/DECISIONS.md: v1-accepted decision entry. - composite.py: chord report note now states the v1.1 framing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stamps v1 ACCEPTED on the audio-only acoustic scope, from the formal all-metrics acceptance run.
Acceptance — SPEC §1.4.1 (GuitarSet held-out player-05, 60 clips)
Report:
docs/EVAL_REPORTS/v1_acceptance_2026-06-03.md. Eval harness292252d,highresbackend + leak-freeguitarset-v1prior. Acceptance test =lower_95_CI ≥ target.Chord-instance accuracy → v1.1 (video)
Measured 0.52 single-line / 0.48 strummed. Whole-chord recovery needs the exact string + fret for every note in a chord, so it carries the same audio string-resolution limit that already lowered single-line Tab F1 0.94 → 0.45 (single-line chord 0.52 ≈ single-line Tab F1 0.52). Re-scoped to a v1.1 video-assisted target (user-approved); v1 records the audio-only baseline.
What's in the diff
292252d— harness: adds chord-instance accuracy (per-clip + per-tier bootstrap CI + report section), correctsDEFAULT_TIER_TARGETSto the §1.4.1 gates (the old 0.85/0.90 predated the 2026-06-02 acoustic-scope amendment), and reuses one audio backend across clips — it was rebuilt per clip, reloading the highres checkpoint every clip (~10× slower, and the accumulation OOM'd a 60-clip run around clip 17).d403713— SPEC §1.4.1 amendment (record the run + re-scope chord), the acceptance report + verdict header, the DECISIONS entry, and the harness chord note.Verification
ruff check+ruff format --checkclean,mypy tabvisionclean (56 files), eval unit + formatter tests pass. The run also neededKMP_DUPLICATE_LIB_OK=TRUEon Windows to dodge a duplicate-OpenMP (libiomp5md.dll) segfault — environment-only, not a code change.🤖 Generated with Claude Code