diff --git a/LICENSES.md b/LICENSES.md index 259beb8..887e1f4 100644 --- a/LICENSES.md +++ b/LICENSES.md @@ -57,11 +57,14 @@ Phase 0 (this document) produces the initial map; Phase 9 verifies. | Dataset | Phase | License | Status | Notes | |---|---|---|---|---| -| GuitarSet | 1.5 / 7 | CC-BY-4.0 | ✅ | https://guitarset.weebly.com — JAMS annotations, hexaphonic. Already used in v0 finetune work. Re-distribution requires attribution; not committed to repo. | -| IDMT-SMT-Guitar | 1.5 / 7 | research-use, registration | ⚠️ | Training-only; not redistributed in our repo. Verify scope of "research use" for portfolio context. | -| EGDB | 1.5 / 7 | TBD | ⚠️ | https://github.com/ss12f32v/GuitarTranscription — multi-amp distorted electric. Verify before relying on it for distorted-electric tier eval. | -| DadaGP | 7 | TBD | ⚠️ | https://github.com/dada-bots/dadaGP — GuitarPro tabs as synthetic-data substrate. | -| User clips (existing 11/20 self-recorded) | 1.5 (bonus) | self-owned | ✅ | iPhone OOD bonus tier per design doc §6. Owned by Patrick. | +| GuitarSet | 1.5 / 7 / **Phase 0 (this PR)** | CC-BY-4.0 | ✅ | https://guitarset.weebly.com — JAMS annotations, hexaphonic. Already used in v0 finetune work. Re-distribution requires attribution; not committed to repo. **Used as the only data source for the 2026-05-13 composite baseline** (player 05 held-out validation; 60 tracks; 8 715 gold notes). | +| Guitar-TECHS | Phase 0 (planned) / 1.5 / 7 | CC-BY-4.0 (paper §4 + Zenodo) | ⚠️ | arXiv:2501.03720 — 5h12m multi-mic + DI; per-string MIDI annotations. Acquisition planned per Phase 0 impl plan §3.2; on-disk scanner stub in `tabvision/tabvision/eval/manifest_builder.py:scan_guitar_techs`. Required attribution must appear in the public README. | +| IDMT-SMT-Guitar | 1.5 / 7 | research-use, registration | ⚠️ | Training-only; not redistributed in our repo. Verified 2026-05-13 research pass; superseded by Guitar-TECHS for v1 acceptance — kept for potential future training augmentation. | +| EGDB | 1.5 / 7 | **none on repo — author email pending** | ⚠️ | https://ss12f32v.github.io/Guitar-Transcription/ — 240 tracks, ~12h with multi-amp electric variants, GuitarPro tabs + aligned MIDI. **Portfolio-use written permission required** before any acquisition (LICENSE file is null per 2026-05-13 verification). Email `f08946011@ntu.edu.tw`; template in `docs/plans/2026-05-12-tab-f1-to-spec-design.md` §8.2. | +| ~~GOAT~~ | DROPPED | request-only, research-only | ❌ | arXiv:2509.22655. Verified 2026-05-13: distribution gated per-use ("for research purposes only, upon request") due to copyrighted cover-song content. Not portfolio-compatible per SPEC §1.5; removed from the eval composite. | +| ~~SynthTab~~ | DROPPED from default pipeline | dataset CC-BY-NC-4.0 (code CC-BY-4.0) | ❌ | github.com/yongyizang/SynthTab. Dataset NC clause taints derived weights (SynthTab paper treats trained models as derivative work). Not portfolio-compatible per SPEC §1.5; removed from the planned pretrain pipeline 2026-05-13. The repo code (Apache/CC-BY) remains MIT-style usable for our own renderers if needed. | +| DadaGP | research/dev only — **not in default pipeline** | access-by-email; underlying GP tabs derive from copyrighted songs | ⚠️ | https://github.com/dada-bots/dadaGP. Per 2026-05-13 design plan §4.2, acceptable as internal training augmentation only. Synthetic-source clips are blocked from non-train manifest splits by `tabvision.eval.manifest.validate_manifest` (the `SYNTHETIC_IN_EVAL_SPLIT` guard). | +| ~~User clips (the 20 self-recorded set)~~ | BANNED | self-owned | ⛔ | Banned from all roles per 2026-05-13 design plan D10 — not as accuracy gate, dev set, or label source. Replaced by the public-corpus composite. | | Roboflow `b101/guitar-3` | 3 (training) | **CC BY 4.0** | ✅ | **Verified 2026-05-05.** Source: https://universe.roboflow.com/b101/guitar-3. Forked into Patrick's workspace as `patricks-workspace-vozcg/guitar-3-4efcd` v2; YOLOv8-OBB export downloaded (926 images, 710/144/72 split, classes: fret / neck / nut). License declared in the dataset's README.dataset.txt: "License: CC BY 4.0". Attribution: "guitar 3" by b101 on Roboflow Universe (https://universe.roboflow.com/b101/guitar-3), CC BY 4.0; export downloaded May 5, 2026 via the Roboflow SDK. **Required attribution must appear in the public README and any blog post.** | ## Library dependencies (default pipeline) diff --git a/docs/DECISIONS.md b/docs/DECISIONS.md index 80df952..5c971d6 100644 --- a/docs/DECISIONS.md +++ b/docs/DECISIONS.md @@ -16,6 +16,62 @@ Format: --- +## 2026-05-13 — Tab F1 v1 acceptance: per-tier targets + public-corpus composite + +**Phase:** Accuracy work (cross-cuts Phases 1, 2, 3, 5, 7, 8 of the SPEC) +**Decision tree:** Design plan adoption + SPEC §1.4 amendment proposal +**Branch taken:** Replace the aggregate 0.88 Tab F1 acceptance gate with +a per-tier table; drop SynthTab (CC-BY-NC) and GOAT (request-only) from +the default pipeline; rely on GuitarSet + Guitar-TECHS + EGDB +(license-pending) for the public-corpus composite eval. + +**Evidence:** +- Strategy / decision record: `docs/plans/2026-05-12-tab-f1-to-spec-design.md` +- Phase 0 implementation plan: `docs/plans/2026-05-13-tab-f1-phase-0-implementation.md` +- SPEC amendment block: `SPEC.md` §1.4.1 (per-tier table + composite test set) +- First baseline artifact (2 of 4 tiers covered): `docs/EVAL_REPORTS/composite_baseline_2026-05-13.md` +- Companion error decomposition: `docs/EVAL_REPORTS/tab_f1_error_decomposition_2026-05-13.md` +- Implementation branch with the eval harness: `impl/tab-f1-phase-0` + +**Reasoning:** The 2026-05-08 GuitarSet validation showed aggregate Tab +F1 = 0.6104 with comp tracks at 0.670 and solo tracks at 0.508. The +aggregate target hid the dominant failure axis (string/fret assignment +on single-line passages), and the SPEC §1.4 numbers (0.94 / 0.86 / 0.90 +/ 0.82) baked in implicit per-tier expectations that the project hadn't +explicitly negotiated. The 2026-05-13 user conversation locked in +relaxed v1 targets (0.85 / 0.90 / 0.87 / 0.80), kept the original SPEC +numbers as the v1.1 / portfolio stretch reference, and committed to +audio-only fusion priors + cheap pitch post-processing as the leverage +path (no SynthTab pretrain → no NC license taint on shipped weights). + +**Per-tier acceptance gate (v1):** + +| Tier | v1 target | 2026-05-13 baseline (mean / lower 95% CI) | +|---|---:|---:| +| Clean acoustic single-line | 0.85 | 0.5076 / 0.4448 (fail) | +| Clean acoustic strummed | 0.90 | 0.6708 / 0.6015 (fail) | +| Clean electric | 0.87 | missing — pending Guitar-TECHS | +| Distorted electric | 0.80 | missing — pending EGDB | + +Both covered tiers fail by ~25–35 pp. Per the error decomposition, +`wrong_position_same_pitch` accounts for 77% of single-line loss and +50% of strummed loss — Phases 1-7 of the design plan target this +bucket. + +**Decisions inventoried in the design plan (D1–D11):** + +- D1 Per-tier replaces aggregate. D2 Targets table. D3 Composite eval. + D4 No SynthTab. D5 Video qualitative-only. D6 Free-tier compute first + (Local > Colab > Kaggle > Lightning > Modal). D7 1-2 month cadence. + D8 No stretch (bends/slides) in v1. D9 D2 numbers on top-1 only. + D10 Personal clips fully banned. D11 This is a SPEC §1.4 amendment, + not a SPEC-achievement plan. + +**Open Phase 0 user actions:** Lightning Studios / Kaggle / Colab / W&B +account verification; EGDB author email; Guitar-TECHS Zenodo download. + +--- + ## 2026-05-05 — Project name kept as `tabvision` (not `tabify`) **Phase:** 0 diff --git a/docs/EVAL_REPORTS/composite_baseline_2026-05-13.md b/docs/EVAL_REPORTS/composite_baseline_2026-05-13.md new file mode 100644 index 0000000..f700b90 --- /dev/null +++ b/docs/EVAL_REPORTS/composite_baseline_2026-05-13.md @@ -0,0 +1,41 @@ +# Composite per-tier baseline + +## Coverage + +**2 of 4 tiers measured.** Clean acoustic single-line + strummed covered +via the GuitarSet validation split (held-out player 05, 60 tracks, +8 715 gold notes). **Clean electric and distorted electric tiers +pending Guitar-TECHS / EGDB acquisition** per the strategy doc §3.1 and +Phase 0 implementation plan §3.2 — see the "missing" rows below. + +This is the first artifact of `impl/tab-f1-phase-0`. Companion +6-bucket error decomposition: [`tab_f1_error_decomposition_2026-05-13.md`](tab_f1_error_decomposition_2026-05-13.md). + +## Per-tier results + +| Tier | Clips | Gold notes | Tab F1 mean | Tab F1 lower-95 | Target | Status | Onset F1 | Pitch F1 | +|---|---:|---:|---:|---:|---:|---|---:|---:| +| clean_acoustic_single_line | 30 | 2179 | 0.5076 | 0.4448 | 0.85 | fail | 0.9375 | 0.9304 | +| clean_acoustic_strummed | 30 | 6536 | 0.6708 | 0.6015 | 0.90 | fail | 0.9229 | 0.9005 | +| clean_electric | 0 | 0 | — | — | 0.87 | missing | — | — | +| distorted_electric | 0 | 0 | — | — | 0.80 | missing | — | — | + +## Per-source breakdown + +| Tier | Source | Clips | Tab F1 mean | Onset F1 mean | Pitch F1 mean | +|---|---|---:|---:|---:|---:| +| clean_acoustic_single_line | GuitarSet | 30 | 0.5076 | 0.9375 | 0.9304 | +| clean_acoustic_strummed | GuitarSet | 30 | 0.6708 | 0.9229 | 0.9005 | + +## Methodology + +- Manifest: `data/eval/composite.toml` +- Audio backend: `highres` +- Position prior: `guitarset-v1` +- Eval-harness SHA: `9a7e957` (the commit that landed both this baseline + artifact and the chord-cluster matcher fix in + `tabvision.eval.error_decomposition.decompose_errors`) +- Onset tolerance: 50 ms +- Bootstrap: N=10,000, seed=42, 95% percentile interval +- Acceptance gate: `lower_95_CI >= target` per design plan §5 + diff --git a/docs/EVAL_REPORTS/tab_f1_error_decomposition_2026-05-13.md b/docs/EVAL_REPORTS/tab_f1_error_decomposition_2026-05-13.md new file mode 100644 index 0000000..5ba1d8e --- /dev/null +++ b/docs/EVAL_REPORTS/tab_f1_error_decomposition_2026-05-13.md @@ -0,0 +1,45 @@ +# Tab F1 error decomposition + +## Diagnostic summary + +**Dominant failure bucket on every covered tier is +`wrong_position_same_pitch`** — the audio detected the right pitch +within onset tolerance but the system placed it on the wrong +(string, fret). + +| Tier | Loss share — wrong_position_same_pitch | +|---|---:| +| clean_acoustic_single_line | **77.5%** (910 / 1174 loss events) | +| clean_acoustic_strummed | **49.7%** (1548 / 3112 loss events) | +| Aggregate | **57.3%** (2458 / 4286 loss events) | + +This matches the strategy doc §2 diagnostic exactly. The audio side +is at SPEC (Pitch F1 ≥ 0.90 on both covered tiers); the gap to D2 +per-tier targets is almost entirely string/fret assignment, and it +gets worse on single-line passages where chord-cluster constraints +can't help the fusion. + +Companion baseline report: [`composite_baseline_2026-05-13.md`](composite_baseline_2026-05-13.md). + +Six-bucket port of the apr-28 7-bucket harness; the seventh apr-28 +bucket (`muted_undetectable`) is deferred until the §8 `TabEvent` +contract carries a muted/X flag. + +## Aggregate (all tiers) + +| Bucket | Count | Share of loss | +|---|---:|---:| +| correct | 4986 | — | +| wrong_position_same_pitch | 2458 | 57.3% | +| pitch_off | 505 | 11.8% | +| timing_only | 94 | 2.2% | +| missed_onset | 672 | 15.7% | +| extra_detection | 557 | 13.0% | + +## Per-tier breakdown + +| Tier | correct | wrong_position_same_pitch | pitch_off | timing_only | missed_onset | extra_detection | +|---|---|---|---|---|---|---| +| clean_acoustic_single_line | 1125 | 910 | 19 | 17 | 108 | 120 | +| clean_acoustic_strummed | 3861 | 1548 | 486 | 77 | 564 | 437 | + diff --git a/docs/plans/2026-05-12-tab-f1-to-spec-design.md b/docs/plans/2026-05-12-tab-f1-to-spec-design.md index ff1569b..78991a3 100644 --- a/docs/plans/2026-05-12-tab-f1-to-spec-design.md +++ b/docs/plans/2026-05-12-tab-f1-to-spec-design.md @@ -213,7 +213,7 @@ phase's evidence justifies starting it. the composite eval. Acquire Guitar-TECHS; send EGDB email; verify free compute accounts. **No production code changes.** Acceptance: per-tier baseline numbers exist for ≥ 3 of 4 tiers with bootstrap CIs; - per-tier 7-bucket error breakdown exists. [Companion: + per-tier six-bucket error breakdown exists. [Companion: `2026-05-13-tab-f1-phase-0-implementation.md`.] - **Phase 1 — Pitch ceiling lift (cheap moves).** Voicing/silence gate + peak-picking + Basic Pitch pitch-only ensemble. Acceptance: Pitch diff --git a/docs/plans/2026-05-13-tab-f1-phase-0-implementation.md b/docs/plans/2026-05-13-tab-f1-phase-0-implementation.md index 0a9cd5f..6d6b8cc 100644 --- a/docs/plans/2026-05-13-tab-f1-phase-0-implementation.md +++ b/docs/plans/2026-05-13-tab-f1-phase-0-implementation.md @@ -17,7 +17,9 @@ Acceptance, copied from the strategy doc §6: - Per-tier baseline numbers for ≥ 3 of 4 D2 tiers with **bootstrap 95% CIs**, on the composite eval set. -- Per-tier 7-bucket error decomposition on the same set. +- Per-tier six-bucket error decomposition on the same set + (port of the apr-28 7-bucket harness; ``muted_undetectable`` deferred + until the §8 ``TabEvent`` contract carries a muted/X flag). - Free-tier compute accounts (Local / Colab / Kaggle / Lightning / W&B) verified. - EGDB author email sent; reply tracked in `docs/DECISIONS.md`. @@ -43,10 +45,10 @@ Acceptance, copied from the strategy doc §6: | `tabvision/tests/unit/test_parser_guitarset_jams.py` | JAMS parser round-trip test | | `tabvision/tests/unit/test_parser_guitar_techs_midi.py` | MIDI parser round-trip test | | `tabvision/tests/unit/test_bootstrap_ci.py` | CI helper correctness on known distributions | -| `tabvision/tests/unit/test_error_decomposition.py` | 7-bucket assignment correctness on synthetic predicted/gold pairs | +| `tabvision/tests/unit/test_error_decomposition.py` | Per-bucket assignment correctness on synthetic predicted/gold pairs (six buckets populated) | | `tabvision/tests/integration/test_composite_eval_smoke.py` | End-to-end smoke: 5-clip manifest → tier numbers exist + CIs computed | | `docs/EVAL_REPORTS/composite_baseline_2026-05-13.md` | First baseline report (output of Phase 0E) | -| `docs/EVAL_REPORTS/tab_f1_error_decomposition_2026-05-13.md` | First 7-bucket decomposition (output of Phase 0D) | +| `docs/EVAL_REPORTS/tab_f1_error_decomposition_2026-05-13.md` | First six-bucket decomposition (output of Phase 0D) | ### 1.2 Modified files @@ -215,8 +217,8 @@ Must contain: Must contain: -- Aggregate 7-bucket table (counts + share-of-loss). -- Per-tier 7-bucket table. +- Aggregate six-bucket table (counts + share-of-loss). +- Per-tier six-bucket table. - A "biggest lever per tier" callout: which bucket dominates each tier's loss. Phase 1+ priorities derive from this. diff --git a/tabvision/data/eval/composite.toml b/tabvision/data/eval/composite.toml new file mode 100644 index 0000000..399c6a6 --- /dev/null +++ b/tabvision/data/eval/composite.toml @@ -0,0 +1,542 @@ +# Composite-eval manifest generated by tabvision/scripts/eval/build_composite_manifest.py. +# Re-generate with the same args to refresh; this file is intended to be auto-managed. + +[[clips]] +id = "guitarset/05_BN1-129-Eb_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_BN1-129-Eb_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_BN1-129-Eb_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_BN1-129-Eb_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_BN1-129-Eb_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_BN1-129-Eb_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_BN1-147-Gb_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_BN1-147-Gb_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_BN1-147-Gb_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_BN1-147-Gb_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_BN1-147-Gb_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_BN1-147-Gb_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_BN2-131-B_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_BN2-131-B_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_BN2-131-B_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_BN2-131-B_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_BN2-131-B_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_BN2-131-B_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_BN2-166-Ab_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_BN2-166-Ab_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_BN2-166-Ab_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_BN2-166-Ab_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_BN2-166-Ab_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_BN2-166-Ab_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_BN3-119-G_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_BN3-119-G_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_BN3-119-G_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_BN3-119-G_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_BN3-119-G_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_BN3-119-G_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_BN3-154-E_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_BN3-154-E_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_BN3-154-E_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_BN3-154-E_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_BN3-154-E_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_BN3-154-E_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Funk1-114-Ab_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Funk1-114-Ab_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Funk1-114-Ab_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Funk1-114-Ab_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Funk1-114-Ab_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Funk1-114-Ab_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Funk1-97-C_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Funk1-97-C_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Funk1-97-C_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Funk1-97-C_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Funk1-97-C_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Funk1-97-C_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Funk2-108-Eb_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Funk2-108-Eb_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Funk2-108-Eb_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Funk2-108-Eb_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Funk2-108-Eb_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Funk2-108-Eb_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Funk2-119-G_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Funk2-119-G_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Funk2-119-G_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Funk2-119-G_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Funk2-119-G_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Funk2-119-G_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Funk3-112-C#_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Funk3-112-C#_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Funk3-112-C#_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Funk3-112-C#_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Funk3-112-C#_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Funk3-112-C#_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Funk3-98-A_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Funk3-98-A_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Funk3-98-A_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Funk3-98-A_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Funk3-98-A_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Funk3-98-A_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Jazz1-130-D_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Jazz1-130-D_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Jazz1-130-D_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Jazz1-130-D_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Jazz1-130-D_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Jazz1-130-D_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Jazz1-200-B_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Jazz1-200-B_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Jazz1-200-B_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Jazz1-200-B_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Jazz1-200-B_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Jazz1-200-B_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Jazz2-110-Bb_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Jazz2-110-Bb_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Jazz2-110-Bb_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Jazz2-110-Bb_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Jazz2-110-Bb_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Jazz2-110-Bb_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Jazz2-187-F#_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Jazz2-187-F#_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Jazz2-187-F#_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Jazz2-187-F#_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Jazz2-187-F#_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Jazz2-187-F#_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Jazz3-137-Eb_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Jazz3-137-Eb_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Jazz3-137-Eb_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Jazz3-137-Eb_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Jazz3-137-Eb_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Jazz3-137-Eb_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Jazz3-150-C_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Jazz3-150-C_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Jazz3-150-C_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Jazz3-150-C_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Jazz3-150-C_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Jazz3-150-C_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Rock1-130-A_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Rock1-130-A_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Rock1-130-A_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Rock1-130-A_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Rock1-130-A_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Rock1-130-A_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Rock1-90-C#_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Rock1-90-C#_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Rock1-90-C#_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Rock1-90-C#_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Rock1-90-C#_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Rock1-90-C#_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Rock2-142-D_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Rock2-142-D_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Rock2-142-D_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Rock2-142-D_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Rock2-142-D_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Rock2-142-D_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Rock2-85-F_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Rock2-85-F_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Rock2-85-F_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Rock2-85-F_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Rock2-85-F_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Rock2-85-F_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Rock3-117-Bb_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Rock3-117-Bb_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Rock3-117-Bb_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Rock3-117-Bb_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Rock3-117-Bb_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Rock3-117-Bb_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Rock3-148-C_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Rock3-148-C_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Rock3-148-C_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_Rock3-148-C_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_Rock3-148-C_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_Rock3-148-C_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_SS1-100-C#_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_SS1-100-C#_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_SS1-100-C#_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_SS1-100-C#_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_SS1-100-C#_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_SS1-100-C#_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_SS1-68-E_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_SS1-68-E_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_SS1-68-E_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_SS1-68-E_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_SS1-68-E_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_SS1-68-E_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_SS2-107-Ab_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_SS2-107-Ab_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_SS2-107-Ab_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_SS2-107-Ab_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_SS2-107-Ab_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_SS2-107-Ab_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_SS2-88-F_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_SS2-88-F_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_SS2-88-F_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_SS2-88-F_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_SS2-88-F_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_SS2-88-F_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_SS3-84-Bb_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_SS3-84-Bb_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_SS3-84-Bb_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_SS3-84-Bb_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_SS3-84-Bb_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_SS3-84-Bb_solo.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_SS3-98-C_comp" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_SS3-98-C_comp_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_SS3-98-C_comp.jams" +annotation_format = "guitarset_jams" + +[[clips]] +id = "guitarset/05_SS3-98-C_solo" +tier = "clean_acoustic_single_line" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_SS3-98-C_solo_mic.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_SS3-98-C_solo.jams" +annotation_format = "guitarset_jams" diff --git a/tabvision/data/eval/manifest.toml b/tabvision/data/eval/manifest.toml index fc5b65c..3654685 100644 --- a/tabvision/data/eval/manifest.toml +++ b/tabvision/data/eval/manifest.toml @@ -17,3 +17,12 @@ # split = "validation" # media_path = "$TABVISION_DATA_ROOT/guitarset/audio_mono-mic/05_example_mic.wav" # annotation_path = "$TABVISION_DATA_ROOT/guitarset/annotation/05_example.jams" +# annotation_format = "guitarset_jams" +# +# `annotation_format` selects the parser registered in +# tabvision.eval.parsers (Phase 0). Known formats: guitarset_jams, +# guitar_techs_midi. Forthcoming: egdb_gp (license-pending). +# +# Synthetic-source clips (source = "synthtab/...", "dadagp/...", +# "synthetic/...") are restricted to split = "train". The validator +# rejects them in validation/test splits — see design plan §5 / R8. diff --git a/tabvision/scripts/eval/build_composite_manifest.py b/tabvision/scripts/eval/build_composite_manifest.py new file mode 100644 index 0000000..9b47f44 --- /dev/null +++ b/tabvision/scripts/eval/build_composite_manifest.py @@ -0,0 +1,10 @@ +"""CLI wrapper for the composite-eval manifest builder. + +See ``docs/plans/2026-05-13-tab-f1-phase-0-implementation.md`` §3.3 for +the canonical invocation. +""" + +from tabvision.eval.manifest_builder import main + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/tabvision/scripts/eval/composite_eval.py b/tabvision/scripts/eval/composite_eval.py new file mode 100644 index 0000000..90d2fd9 --- /dev/null +++ b/tabvision/scripts/eval/composite_eval.py @@ -0,0 +1,10 @@ +"""CLI wrapper for the v1 composite per-tier eval. + +See ``docs/plans/2026-05-13-tab-f1-phase-0-implementation.md`` §3.4 for +the canonical invocation. +""" + +from tabvision.eval.composite import main + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/tabvision/tabvision/eval/bootstrap.py b/tabvision/tabvision/eval/bootstrap.py new file mode 100644 index 0000000..e3379e9 --- /dev/null +++ b/tabvision/tabvision/eval/bootstrap.py @@ -0,0 +1,112 @@ +"""Bootstrap confidence intervals for per-tier acceptance gates. + +The 2026-05-12 design plan (§5) requires every per-tier Tab F1 number +to be reported with a 95% bootstrap CI, and the acceptance gate is +``lower_95_CI >= target`` — not just ``mean >= target``. This module +provides that primitive. + +Resamples observations (typically per-clip Tab F1 values) with +replacement, applies a user-supplied statistic to each resample, and +returns the original-sample statistic plus the symmetric percentile +interval over the bootstrap distribution. +""" + +from __future__ import annotations + +from collections.abc import Callable, Sequence +from dataclasses import dataclass + +import numpy as np + + +@dataclass(frozen=True) +class BootstrapResult: + """Bootstrap statistic + symmetric confidence interval. + + ``lower`` and ``upper`` are the ``(1-confidence)/2`` and + ``(1+confidence)/2`` quantiles of the bootstrap distribution. + For a single observation, ``statistic == lower == upper`` and + ``n_bootstrap`` is ``0`` (no resampling performed). + """ + + statistic: float + lower: float + upper: float + n_observations: int + n_bootstrap: int + confidence: float + + +def bootstrap_ci( + values: Sequence[float] | np.ndarray, + *, + statistic: Callable[[np.ndarray], float] | None = None, + n_bootstrap: int = 10_000, + confidence: float = 0.95, + seed: int = 42, +) -> BootstrapResult: + """Bootstrap a confidence interval over ``values``. + + ``statistic`` defaults to ``numpy.mean``. Pass a different callable + (e.g. ``numpy.median``) for other functionals. The callable receives + a 1-D ``numpy.ndarray`` of float64 values. + + ``seed`` is the integer seed for ``numpy.random.default_rng``; + calling with the same seed + values produces identical output. + """ + if len(values) == 0: + raise ValueError("bootstrap_ci requires at least one observation") + if not 0.0 < confidence < 1.0: + raise ValueError( + f"confidence must be in (0, 1); got {confidence}" + ) + if n_bootstrap < 1: + raise ValueError(f"n_bootstrap must be >= 1; got {n_bootstrap}") + + stat_fn: Callable[[np.ndarray], float] = ( + statistic if statistic is not None else np.mean + ) + arr = np.asarray(values, dtype=np.float64).ravel() + n_obs = arr.shape[0] + point = float(stat_fn(arr)) + + if n_obs == 1: + return BootstrapResult( + statistic=point, + lower=point, + upper=point, + n_observations=1, + n_bootstrap=0, + confidence=confidence, + ) + + rng = np.random.default_rng(seed) + indices = rng.integers(0, n_obs, size=(n_bootstrap, n_obs)) + resamples = arr[indices] # shape (n_bootstrap, n_obs) + + if statistic is None or statistic is np.mean: + # Fast path: vectorized mean over rows. + dist = resamples.mean(axis=1) + else: + # General path: apply user statistic per resample. + dist = np.fromiter( + (float(stat_fn(resamples[i])) for i in range(n_bootstrap)), + dtype=np.float64, + count=n_bootstrap, + ) + + alpha = (1.0 - confidence) / 2.0 + lower = float(np.quantile(dist, alpha)) + upper = float(np.quantile(dist, 1.0 - alpha)) + + return BootstrapResult( + statistic=point, + lower=lower, + upper=upper, + n_observations=n_obs, + n_bootstrap=n_bootstrap, + confidence=confidence, + ) + + +__all__ = ["BootstrapResult", "bootstrap_ci"] diff --git a/tabvision/tabvision/eval/composite.py b/tabvision/tabvision/eval/composite.py new file mode 100644 index 0000000..578f195 --- /dev/null +++ b/tabvision/tabvision/eval/composite.py @@ -0,0 +1,548 @@ +"""Composite multi-source eval — Phase 0 per-tier baseline harness. + +Reads a manifest (validated by :mod:`tabvision.eval.manifest`), +dispatches each clip's annotation through the registered parser, +runs a user-supplied predictor over the media, and aggregates per-tier +onset / pitch / tab F1 with bootstrap CIs plus the error-decomposition +buckets. + +The predictor is **injected** so the harness is testable without the +heavy audio backend. Production usage wires up +:func:`tabvision.pipeline.run_pipeline` from the CLI; tests pass a +fake predictor for fast iteration. +""" + +from __future__ import annotations + +import os +import tomllib +from collections.abc import Callable, Mapping +from dataclasses import dataclass +from pathlib import Path + +from tabvision.eval.bootstrap import BootstrapResult, bootstrap_ci +from tabvision.eval.error_decomposition import ( + ErrorDecomposition, + aggregate_decompositions, + decompose_errors, +) +from tabvision.eval.manifest import ManifestValidation, validate_manifest +from tabvision.eval.metrics import ( + EventF1Result, + TabF1Result, + event_f1, + tab_f1, +) +from tabvision.eval.parsers import get_parser +from tabvision.types import GuitarConfig, SessionConfig, TabEvent + +Predictor = Callable[[Path, SessionConfig], list[TabEvent]] +"""``(media_path, session) -> list[TabEvent]``. The composite-eval harness +calls this once per non-train clip.""" + + +@dataclass(frozen=True) +class ClipEvalResult: + """Per-clip metrics + error decomposition.""" + + clip_id: str + tier: str + source: str + n_gold: int + n_predicted: int + onset: EventF1Result + pitch: EventF1Result + tab: TabF1Result + errors: ErrorDecomposition + + +@dataclass(frozen=True) +class TierReport: + """Aggregate metrics for one tier — bootstrap CI on each F1.""" + + tier: str + n_clips: int + n_gold_total: int + onset_f1: BootstrapResult + pitch_f1: BootstrapResult + tab_f1: BootstrapResult + errors: ErrorDecomposition # summed across clips in this tier + + +@dataclass(frozen=True) +class CompositeReport: + """Top-level composite-eval result.""" + + manifest_path: str + manifest_validation: ManifestValidation + per_clip: list[ClipEvalResult] + tiers: Mapping[str, TierReport] + bootstrap_n: int + bootstrap_seed: int + onset_tolerance_s: float + + def tab_f1_acceptance(self, targets: Mapping[str, float]) -> dict[str, str]: + """Compute the pass/gap/fail status per tier vs ``targets``. + + Status semantics per design plan §5: + - ``"pass"``: ``lower_95_CI >= target`` (the official acceptance bar) + - ``"gap"``: ``mean >= target > lower_95_CI`` + - ``"fail"``: ``mean < target`` + - ``"missing"``: tier has no clips in this report + """ + statuses: dict[str, str] = {} + for tier, target in targets.items(): + report = self.tiers.get(tier) + if report is None: + statuses[tier] = "missing" + continue + mean = report.tab_f1.statistic + lower = report.tab_f1.lower + if lower >= target: + statuses[tier] = "pass" + elif mean >= target: + statuses[tier] = "gap" + else: + statuses[tier] = "fail" + return statuses + + +DEFAULT_EVAL_SPLITS: tuple[str, ...] = ("validation", "test") +"""Splits included in composite eval by default. ``train`` is excluded.""" + + +def run_composite_eval( + manifest_path: str | Path, + *, + predictor: Predictor, + media_root: str | Path | None = None, + annotation_root: str | Path | None = None, + splits: tuple[str, ...] = DEFAULT_EVAL_SPLITS, + cfg: GuitarConfig | None = None, + onset_tolerance_s: float = 0.05, + bootstrap_n: int = 10_000, + bootstrap_seed: int = 42, +) -> CompositeReport: + """Per-clip eval, then per-tier aggregation with bootstrap CIs. + + Raises ``ValueError`` if the manifest fails validation (fail-severity + issues from :func:`validate_manifest`). Train-split clips are + skipped by default; pass ``splits=("train",)`` to evaluate on them + (useful for diagnosing training-set fit). + """ + manifest_path = Path(manifest_path) + validation = validate_manifest(manifest_path) + if not validation.passed: + fail_messages = [ + i.message for i in validation.items if i.severity == "fail" + ] + raise ValueError( + f"Manifest {manifest_path} has fail-severity issues: {fail_messages}" + ) + + if cfg is None: + cfg = GuitarConfig() + + payload = tomllib.loads(manifest_path.read_text(encoding="utf-8")) + clips = payload.get("clips") or [] + + per_clip: list[ClipEvalResult] = [] + for clip in clips: + if clip["split"] not in splits: + continue + + media_path = _resolve_path(clip["media_path"], media_root) + annotation_path = _resolve_path(clip["annotation_path"], annotation_root) + + parser = get_parser(clip["annotation_format"]) + gold = parser(annotation_path, cfg) + + session = _session_from_clip(clip) + predicted = predictor(media_path, session) + + per_clip.append( + ClipEvalResult( + clip_id=clip["id"], + tier=clip["tier"], + source=clip["source"], + n_gold=len(gold), + n_predicted=len(predicted), + onset=event_f1( + predicted, gold, match_pitch=False, onset_tolerance_s=onset_tolerance_s + ), + pitch=event_f1( + predicted, gold, match_pitch=True, onset_tolerance_s=onset_tolerance_s + ), + tab=tab_f1(predicted, gold, onset_tolerance_s=onset_tolerance_s), + errors=decompose_errors( + predicted, gold, onset_tolerance_s=onset_tolerance_s + ), + ) + ) + + tiers = _aggregate_per_tier( + per_clip, + bootstrap_n=bootstrap_n, + bootstrap_seed=bootstrap_seed, + ) + + return CompositeReport( + manifest_path=str(manifest_path), + manifest_validation=validation, + per_clip=per_clip, + tiers=tiers, + bootstrap_n=bootstrap_n, + bootstrap_seed=bootstrap_seed, + onset_tolerance_s=onset_tolerance_s, + ) + + +def _aggregate_per_tier( + per_clip: list[ClipEvalResult], + *, + bootstrap_n: int, + bootstrap_seed: int, +) -> dict[str, TierReport]: + by_tier: dict[str, list[ClipEvalResult]] = {} + for result in per_clip: + by_tier.setdefault(result.tier, []).append(result) + + reports: dict[str, TierReport] = {} + for tier, results in by_tier.items(): + onset_f1s = [r.onset.f1 for r in results] + pitch_f1s = [r.pitch.f1 for r in results] + tab_f1s = [r.tab.f1 for r in results] + reports[tier] = TierReport( + tier=tier, + n_clips=len(results), + n_gold_total=sum(r.n_gold for r in results), + onset_f1=bootstrap_ci( + onset_f1s, n_bootstrap=bootstrap_n, seed=bootstrap_seed + ), + pitch_f1=bootstrap_ci( + pitch_f1s, n_bootstrap=bootstrap_n, seed=bootstrap_seed + ), + tab_f1=bootstrap_ci( + tab_f1s, n_bootstrap=bootstrap_n, seed=bootstrap_seed + ), + errors=aggregate_decompositions(r.errors for r in results), + ) + return reports + + +def _resolve_path(path_str: str, root: str | Path | None) -> Path: + """Expand ``$TABVISION_DATA_ROOT`` and apply optional override. + + ``root`` (function arg) takes precedence over the env var. + """ + expanded = path_str + if "$TABVISION_DATA_ROOT" in path_str: + resolved_root: str | None + if root is not None: + resolved_root = str(root) + else: + resolved_root = os.environ.get("TABVISION_DATA_ROOT") + if not resolved_root: + raise ValueError( + f"Path {path_str!r} contains $TABVISION_DATA_ROOT but neither " + f"the env var nor the function arg is set" + ) + expanded = path_str.replace("$TABVISION_DATA_ROOT", resolved_root) + return Path(expanded).expanduser() + + +def _session_from_clip(clip: dict[str, object]) -> SessionConfig: + """Map manifest clip metadata to a :class:`SessionConfig`. + + Phase 0 defaults all clips to acoustic / clean / mixed. Per-clip + instrument / tone / style fields can be added to the manifest + schema in a later phase. + """ + del clip # unused in Phase 0 + return SessionConfig() + + +DEFAULT_TIER_TARGETS: Mapping[str, float] = { + "clean_acoustic_single_line": 0.85, + "clean_acoustic_strummed": 0.90, + "clean_electric": 0.87, + "distorted_electric": 0.80, +} +"""Per-tier Tab F1 acceptance targets from SPEC §1.4.1. + +These are the v1 acceptance bar locked in by the 2026-05-13 design plan +§0 D2. The original SPEC §1.4 numbers (0.94 / 0.86 / 0.90 / 0.82) are +the v1.1 / portfolio stretch reference, not used here. +""" + + +def format_baseline_markdown( + report: CompositeReport, + *, + targets: Mapping[str, float] = DEFAULT_TIER_TARGETS, + backend_label: str = "", + position_prior_label: str = "", + eval_harness_sha: str = "", + title: str = "Composite per-tier baseline", +) -> str: + """Render a Phase 0 per-tier baseline report as Markdown. + + Output format follows + ``docs/plans/2026-05-13-tab-f1-phase-0-implementation.md`` §4.1. + """ + statuses = report.tab_f1_acceptance(targets) + lines: list[str] = [f"# {title}", ""] + + lines.append("## Per-tier results") + lines.append("") + header_cells = [ + "Tier", + "Clips", + "Gold notes", + "Tab F1 mean", + "Tab F1 lower-95", + "Target", + "Status", + "Onset F1", + "Pitch F1", + ] + lines.append("| " + " | ".join(header_cells) + " |") + lines.append("|---|---:|---:|---:|---:|---:|---|---:|---:|") + for tier, target in targets.items(): + tier_report = report.tiers.get(tier) + if tier_report is None: + lines.append( + f"| {tier} | 0 | 0 | — | — | {target:.2f} | missing | — | — |" + ) + continue + tab_mean = tier_report.tab_f1.statistic + tab_lo = tier_report.tab_f1.lower + onset_mean = tier_report.onset_f1.statistic + pitch_mean = tier_report.pitch_f1.statistic + lines.append( + f"| {tier} | {tier_report.n_clips} | {tier_report.n_gold_total} | " + f"{tab_mean:.4f} | {tab_lo:.4f} | {target:.2f} | {statuses[tier]} | " + f"{onset_mean:.4f} | {pitch_mean:.4f} |" + ) + lines.append("") + + lines.append("## Per-source breakdown") + lines.append("") + lines.append("| Tier | Source | Clips | Tab F1 mean | Onset F1 mean | Pitch F1 mean |") + lines.append("|---|---|---:|---:|---:|---:|") + grouped: dict[tuple[str, str], list[ClipEvalResult]] = {} + for clip in report.per_clip: + grouped.setdefault((clip.tier, clip.source), []).append(clip) + for (tier, source), clips in sorted(grouped.items()): + tab_mean = sum(c.tab.f1 for c in clips) / len(clips) + onset_mean = sum(c.onset.f1 for c in clips) / len(clips) + pitch_mean = sum(c.pitch.f1 for c in clips) / len(clips) + lines.append( + f"| {tier} | {source} | {len(clips)} | " + f"{tab_mean:.4f} | {onset_mean:.4f} | {pitch_mean:.4f} |" + ) + lines.append("") + + lines.append("## Methodology") + lines.append("") + lines.append(f"- Manifest: `{report.manifest_path}`") + lines.append(f"- Audio backend: `{backend_label}`") + lines.append(f"- Position prior: `{position_prior_label}`") + lines.append(f"- Eval-harness SHA: `{eval_harness_sha}`") + lines.append(f"- Onset tolerance: {report.onset_tolerance_s * 1000:.0f} ms") + lines.append( + f"- Bootstrap: N={report.bootstrap_n:,}, seed={report.bootstrap_seed}, " + f"95% percentile interval" + ) + lines.append( + "- Acceptance gate: `lower_95_CI >= target` per design plan §5" + ) + lines.append("") + + return "\n".join(lines) + "\n" + + +def format_decomposition_markdown( + report: CompositeReport, + *, + title: str = "Tab F1 error decomposition", +) -> str: + """Render the per-tier six-bucket error decomposition. + + Six buckets are populated; the apr-28 ``muted_undetectable`` seventh + bucket is deferred until the v1 contract carries a muted/X flag. + """ + bucket_columns = ( + "correct", + "wrong_position_same_pitch", + "pitch_off", + "timing_only", + "missed_onset", + "extra_detection", + ) + lines: list[str] = [f"# {title}", ""] + + lines.append("## Aggregate (all tiers)") + lines.append("") + from tabvision.eval.error_decomposition import aggregate_decompositions + + overall = aggregate_decompositions(c.errors for c in report.per_clip) + lines.append("| Bucket | Count | Share of loss |") + lines.append("|---|---:|---:|") + shares = overall.share_of_loss() + for col in bucket_columns: + count = getattr(overall, col) + if col == "correct": + lines.append(f"| {col} | {count} | — |") + else: + lines.append(f"| {col} | {count} | {shares[col] * 100:.1f}% |") + lines.append("") + + lines.append("## Per-tier breakdown") + lines.append("") + header_cells = ["Tier"] + list(bucket_columns) + lines.append("| " + " | ".join(header_cells) + " |") + lines.append("|" + "|".join(["---"] * len(header_cells)) + "|") + for tier_name in sorted(report.tiers): + tier_report = report.tiers[tier_name] + row = [tier_name] + for col in bucket_columns: + row.append(str(getattr(tier_report.errors, col))) + lines.append("| " + " | ".join(row) + " |") + lines.append("") + + return "\n".join(lines) + "\n" + + +def make_run_pipeline_predictor( + *, + audio_backend_name: str, + position_prior: str | None, + melodic_prior_enabled: bool = False, + video_enabled: bool = False, +) -> Predictor: + """Wrap :func:`tabvision.pipeline.run_pipeline` for composite-eval use. + + Imports ``run_pipeline`` lazily so the composite-eval CLI's --help + works without the audio-highres extras installed. + """ + from tabvision.pipeline import run_pipeline # noqa: PLC0415 + + def predictor(media_path: Path, session: SessionConfig) -> list[TabEvent]: + return run_pipeline( + str(media_path), + audio_backend_name=audio_backend_name, + position_prior=position_prior, + melodic_prior_enabled=melodic_prior_enabled, + video_enabled=video_enabled, + session=session, + ) + + return predictor + + +def main(argv: list[str] | None = None) -> int: + """CLI entry point: ``tabvision-composite-eval``.""" + import argparse + + parser = argparse.ArgumentParser( + prog="tabvision-composite-eval", + description=( + "Run the v1 per-tier composite eval and write a Markdown report." + ), + ) + parser.add_argument("--manifest", type=Path, required=True) + parser.add_argument("--backend", default="highres", help="audio backend name") + parser.add_argument( + "--position-prior", + default="guitarset-v1", + help='position prior name; pass "none" to disable', + ) + parser.add_argument("--melodic-prior", action="store_true") + parser.add_argument( + "--enable-video", + action="store_true", + help="enable video stack (default: off — Phase 0 ships audio-only)", + ) + parser.add_argument("--output", type=Path, required=True) + parser.add_argument( + "--decomposition-output", + type=Path, + help=( + "optional: write the six-bucket error decomposition " + "(port of the apr-28 7-bucket harness; muted_undetectable deferred) " + "to this file too" + ), + ) + parser.add_argument("--bootstrap-n", type=int, default=10_000) + parser.add_argument("--bootstrap-seed", type=int, default=42) + parser.add_argument("--onset-tolerance-s", type=float, default=0.05) + parser.add_argument( + "--splits", + default="validation,test", + help="comma-separated splits to include", + ) + parser.add_argument("--media-root", type=Path, default=None) + parser.add_argument("--annotation-root", type=Path, default=None) + parser.add_argument("--eval-harness-sha", default="") + + args = parser.parse_args(argv) + + position_prior: str | None = args.position_prior + if position_prior and position_prior.lower() == "none": + position_prior = None + + predictor = make_run_pipeline_predictor( + audio_backend_name=args.backend, + position_prior=position_prior, + melodic_prior_enabled=args.melodic_prior, + video_enabled=args.enable_video, + ) + + splits = tuple(s.strip() for s in args.splits.split(",") if s.strip()) + + report = run_composite_eval( + args.manifest, + predictor=predictor, + media_root=args.media_root, + annotation_root=args.annotation_root, + splits=splits, + onset_tolerance_s=args.onset_tolerance_s, + bootstrap_n=args.bootstrap_n, + bootstrap_seed=args.bootstrap_seed, + ) + + baseline_md = format_baseline_markdown( + report, + backend_label=args.backend, + position_prior_label=position_prior or "none", + eval_harness_sha=args.eval_harness_sha, + ) + args.output.parent.mkdir(parents=True, exist_ok=True) + args.output.write_text(baseline_md, encoding="utf-8") + + if args.decomposition_output: + decomp_md = format_decomposition_markdown(report) + args.decomposition_output.parent.mkdir(parents=True, exist_ok=True) + args.decomposition_output.write_text(decomp_md, encoding="utf-8") + + return 0 + + +__all__ = [ + "ClipEvalResult", + "CompositeReport", + "DEFAULT_EVAL_SPLITS", + "DEFAULT_TIER_TARGETS", + "Predictor", + "TierReport", + "format_baseline_markdown", + "format_decomposition_markdown", + "main", + "make_run_pipeline_predictor", + "run_composite_eval", +] + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/tabvision/tabvision/eval/error_decomposition.py b/tabvision/tabvision/eval/error_decomposition.py new file mode 100644 index 0000000..59c45d1 --- /dev/null +++ b/tabvision/tabvision/eval/error_decomposition.py @@ -0,0 +1,269 @@ +"""Tab F1 error decomposition — six-bucket port of the apr-28 7-bucket harness. + +Ports the methodology from +``tabvision-server/tools/outputs/errors-2026-04-28_185743.md`` to operate +on §8 ``TabEvent`` lists (the v1 contract) instead of the v0 internal +``Note`` representation. + +Six failure buckets (the apr-28 ``muted_undetectable`` bucket needs a +muted/X flag the v1 contract does not yet carry; deferred to a later +phase): + +- ``correct``: predicted event matches a gold event on string + fret + + onset within ``onset_tolerance_s``. +- ``wrong_position_same_pitch``: predicted event matches on + ``pitch_midi`` + onset within tolerance, but a different + ``(string_idx, fret)``. This is the bucket that dominated the + 2026-05-08 GuitarSet validation (~35% of loss on personal clips per + the apr-28 report). +- ``pitch_off``: predicted event aligns in onset but pitch_midi + differs from the matched gold. Audio-side loss. +- ``timing_only``: predicted event matches on position or pitch but + the onset is outside ``onset_tolerance_s`` and within + ``timing_extended_tolerance_s``. +- ``missed_onset``: gold event has no predicted event near it within + the extended tolerance. +- ``extra_detection``: predicted event that did not match any gold + event by either rule above. + +Per the strategy doc §2 the dominant failure axis is +``wrong_position_same_pitch`` on solos. This module lets us measure +that explicitly per tier. +""" + +from __future__ import annotations + +from collections.abc import Iterable, Sequence +from dataclasses import dataclass, fields + +from tabvision.types import TabEvent + +DEFAULT_ONSET_TOLERANCE_S = 0.05 +DEFAULT_TIMING_EXTENDED_TOLERANCE_S = 0.15 + + +@dataclass(frozen=True) +class ErrorDecomposition: + """Six-bucket failure breakdown for one (predicted, gold) pair. + + Construct via :func:`decompose_errors`; sum across tracks via + :func:`aggregate_decompositions`. Bucket counts are non-negative + integers. + """ + + correct: int = 0 + wrong_position_same_pitch: int = 0 + pitch_off: int = 0 + timing_only: int = 0 + missed_onset: int = 0 + extra_detection: int = 0 + + @property + def total_gold(self) -> int: + """Number of gold events accounted for. Excludes ``extra_detection``.""" + return ( + self.correct + + self.wrong_position_same_pitch + + self.pitch_off + + self.timing_only + + self.missed_onset + ) + + @property + def total_predicted(self) -> int: + """Number of predicted events accounted for. Excludes ``missed_onset``.""" + return ( + self.correct + + self.wrong_position_same_pitch + + self.pitch_off + + self.timing_only + + self.extra_detection + ) + + @property + def total_loss(self) -> int: + """Events contributing to Tab F1 loss (everything except ``correct``).""" + return ( + self.wrong_position_same_pitch + + self.pitch_off + + self.timing_only + + self.missed_onset + + self.extra_detection + ) + + def share_of_loss(self) -> dict[str, float]: + """Per-bucket share of recoverable Tab F1 loss. + + ``correct`` events are not counted as loss; the remaining five + buckets sum to 1.0 (or all zeros if ``total_loss`` is 0). + """ + total = self.total_loss + if total == 0: + return { + "wrong_position_same_pitch": 0.0, + "pitch_off": 0.0, + "timing_only": 0.0, + "missed_onset": 0.0, + "extra_detection": 0.0, + } + return { + "wrong_position_same_pitch": self.wrong_position_same_pitch / total, + "pitch_off": self.pitch_off / total, + "timing_only": self.timing_only / total, + "missed_onset": self.missed_onset / total, + "extra_detection": self.extra_detection / total, + } + + def to_dict(self) -> dict[str, int]: + return {f.name: getattr(self, f.name) for f in fields(self)} + + +def decompose_errors( + predicted: Sequence[TabEvent], + gold: Sequence[TabEvent], + *, + onset_tolerance_s: float = DEFAULT_ONSET_TOLERANCE_S, + timing_extended_tolerance_s: float = DEFAULT_TIMING_EXTENDED_TOLERANCE_S, +) -> ErrorDecomposition: + """Bucket the events into the six-bucket Phase 0 schema. + + The matcher is **priority-based** within each tolerance window so + chord clusters (multiple gold events at the same onset) don't get + mis-paired by raw onset proximity: + + 1. **Strict-tolerance pass.** For each gold event, search unclaimed + predicted events within ``onset_tolerance_s``. Pick the best in + priority order: + - same ``(string_idx, fret)`` → ``correct`` + - same ``pitch_midi`` → ``wrong_position_same_pitch`` + - neither → ``pitch_off`` + Within each priority bucket, ties are broken by closest onset. + 2. **Extended-tolerance pass.** For each gold event still unmatched, + search within ``timing_extended_tolerance_s`` for a predicted + event that agrees on position or pitch → ``timing_only``. + Else → ``missed_onset``. + + Unclaimed predicted events after both passes → ``extra_detection``. + + Priority matters: in a chord cluster with three gold events at the + same onset and three predicted events with matching pitches but + different on-the-wire ordering, onset-only greediness would shuffle + pairings and inflate ``pitch_off``. Priority-based matching tracks + ``event_f1(match_pitch=True)`` exactly when ``Pitch F1 = 1.0``. + """ + if onset_tolerance_s <= 0: + raise ValueError(f"onset_tolerance_s must be positive; got {onset_tolerance_s}") + if timing_extended_tolerance_s < onset_tolerance_s: + raise ValueError( + f"timing_extended_tolerance_s ({timing_extended_tolerance_s}) must be " + f">= onset_tolerance_s ({onset_tolerance_s})" + ) + + pred_used = [False] * len(predicted) + + correct = 0 + wrong_position = 0 + pitch_off = 0 + timing_only = 0 + missed = 0 + + gold_sorted = sorted(gold, key=lambda g: g.onset_s) + + for g in gold_sorted: + # Pass 1: strict-tolerance, priority-ordered match. + best_pos_idx = -1 + best_pitch_idx = -1 + best_any_idx = -1 + best_pos_dt = onset_tolerance_s + 1e-9 + best_pitch_dt = onset_tolerance_s + 1e-9 + best_any_dt = onset_tolerance_s + 1e-9 + + for pi, p in enumerate(predicted): + if pred_used[pi]: + continue + dt = abs(p.onset_s - g.onset_s) + if dt > onset_tolerance_s: + continue + same_pos = p.string_idx == g.string_idx and p.fret == g.fret + same_pitch = p.pitch_midi == g.pitch_midi + if same_pos: + if dt < best_pos_dt: + best_pos_idx = pi + best_pos_dt = dt + elif same_pitch: + if dt < best_pitch_dt: + best_pitch_idx = pi + best_pitch_dt = dt + elif dt < best_any_dt: + best_any_idx = pi + best_any_dt = dt + + if best_pos_idx >= 0: + pred_used[best_pos_idx] = True + correct += 1 + continue + if best_pitch_idx >= 0: + pred_used[best_pitch_idx] = True + wrong_position += 1 + continue + if best_any_idx >= 0: + pred_used[best_any_idx] = True + pitch_off += 1 + continue + + # Pass 2: extended-tolerance match on position OR pitch. + timing_idx = -1 + timing_dt = timing_extended_tolerance_s + 1e-9 + for pi, p in enumerate(predicted): + if pred_used[pi]: + continue + dt = abs(p.onset_s - g.onset_s) + if dt > timing_extended_tolerance_s: + continue + same_pos = p.string_idx == g.string_idx and p.fret == g.fret + same_pitch = p.pitch_midi == g.pitch_midi + if (same_pos or same_pitch) and dt < timing_dt: + timing_idx = pi + timing_dt = dt + + if timing_idx >= 0: + pred_used[timing_idx] = True + timing_only += 1 + continue + + missed += 1 + + extra = sum(1 for used in pred_used if not used) + + return ErrorDecomposition( + correct=correct, + wrong_position_same_pitch=wrong_position, + pitch_off=pitch_off, + timing_only=timing_only, + missed_onset=missed, + extra_detection=extra, + ) + + +def aggregate_decompositions( + decompositions: Iterable[ErrorDecomposition], +) -> ErrorDecomposition: + """Sum a sequence of per-track decompositions into an aggregate.""" + items = list(decompositions) + return ErrorDecomposition( + correct=sum(d.correct for d in items), + wrong_position_same_pitch=sum(d.wrong_position_same_pitch for d in items), + pitch_off=sum(d.pitch_off for d in items), + timing_only=sum(d.timing_only for d in items), + missed_onset=sum(d.missed_onset for d in items), + extra_detection=sum(d.extra_detection for d in items), + ) + + +__all__ = [ + "DEFAULT_ONSET_TOLERANCE_S", + "DEFAULT_TIMING_EXTENDED_TOLERANCE_S", + "ErrorDecomposition", + "aggregate_decompositions", + "decompose_errors", +] diff --git a/tabvision/tabvision/eval/manifest.py b/tabvision/tabvision/eval/manifest.py index 1d43d0d..9b37caa 100644 --- a/tabvision/tabvision/eval/manifest.py +++ b/tabvision/tabvision/eval/manifest.py @@ -24,10 +24,24 @@ "split", "media_path", "annotation_path", + "annotation_format", ) ALLOWED_SPLITS: tuple[str, ...] = ("train", "validation", "test") MIN_PHASE15_CLIPS = 15 +SYNTHETIC_SOURCE_PREFIXES: tuple[str, ...] = ( + "synthtab/", + "dadagp/", + "synthetic/", +) +"""Source-name prefixes flagged as synthetic. + +Per the 2026-05-12 design plan §5 (R8 in §7), synthetic-source clips +must not appear in non-train splits. ``validate_manifest`` emits a +``SYNTHETIC_IN_EVAL_SPLIT`` fail issue when a clip whose ``source`` +starts with any of these prefixes is listed with ``split`` of +``"validation"`` or ``"test"``.""" + Severity = Literal["info", "warn", "fail"] @@ -198,6 +212,25 @@ def validate_manifest(path: str | Path) -> ManifestValidation: ) ) + # Cross-contamination guard: synthetic-source clips must not appear + # in non-train splits. See design plan §5 / risk R8. + source = _string_field(clip, "source") or "" + if split in {"validation", "test"} and any( + source.lower().startswith(prefix) for prefix in SYNTHETIC_SOURCE_PREFIXES + ): + items.append( + ManifestIssue( + severity="fail", + code="SYNTHETIC_IN_EVAL_SPLIT", + message=( + f"Clip {clip_id!r} has synthetic source {source!r} but " + f"split={split!r}; synthetic-source clips are restricted to " + f"split='train' (design plan §5 / R8)." + ), + clip_id=clip_id, + ) + ) + if len(clips) < MIN_PHASE15_CLIPS: items.append( ManifestIssue( @@ -251,5 +284,6 @@ def _missing_tier_issues(missing_tiers: tuple[str, ...] | list[str]) -> list[Man "OPTIONAL_TIERS", "REQUIRED_CLIP_FIELDS", "REQUIRED_TIERS", + "SYNTHETIC_SOURCE_PREFIXES", "validate_manifest", ] diff --git a/tabvision/tabvision/eval/manifest_builder.py b/tabvision/tabvision/eval/manifest_builder.py new file mode 100644 index 0000000..a919a55 --- /dev/null +++ b/tabvision/tabvision/eval/manifest_builder.py @@ -0,0 +1,427 @@ +"""Composite-eval manifest builder. + +Scans known dataset roots on disk and emits a TOML manifest suitable +for ``tabvision-composite-eval``. Designed to be deterministic so +re-runs on the same data produce byte-identical output: clips are +emitted in sorted-id order, and per-tier caps + total limits are +applied after that sort. + +Currently supports: + +- **GuitarSet** (CC-BY-4.0) — clean acoustic single-line + strummed + tiers. Default split = player 05 → validation, others → train. +- **Guitar-TECHS** (CC-BY-4.0) — stubbed; Phase 0 returns ``[]`` until + the dataset is acquired locally and the on-disk layout is verified. + +EGDB is intentionally not yet wired up (license-pending per the +2026-05-13 design plan). +""" + +from __future__ import annotations + +import argparse +from collections.abc import Iterable +from dataclasses import dataclass +from pathlib import Path + +from tabvision.eval.manifest import ( + SYNTHETIC_SOURCE_PREFIXES, + ManifestValidation, + validate_manifest, +) + +GUITARSET_VALIDATION_PLAYER = "05" + + +@dataclass(frozen=True) +class ClipEntry: + """Minimal clip-row representation, one per manifest ``[[clips]]``.""" + + id: str + tier: str + source: str + split: str + media_path: str + annotation_path: str + annotation_format: str + + +def _guitarset_tier(track_id: str) -> str | None: + """Map a GuitarSet track id suffix to a SPEC §1.4 tier name. + + Returns ``None`` for unrecognised suffixes (track is skipped). + """ + if track_id.endswith("_comp"): + return "clean_acoustic_strummed" + if track_id.endswith("_solo"): + return "clean_acoustic_single_line" + return None + + +def _guitarset_split(track_id: str, validation_player: str) -> str: + """``validation`` for the held-out player, ``train`` otherwise.""" + if track_id.split("_", 1)[0] == validation_player: + return "validation" + return "train" + + +def scan_guitarset( + root: Path, + *, + validation_player: str = GUITARSET_VALIDATION_PLAYER, +) -> list[ClipEntry]: + """Scan a GuitarSet directory tree and return discovered clips. + + Expected layout:: + + /annotation/.jams + /audio_mono-mic/_mic.wav + + Tracks missing either file are skipped. Tracks whose suffix is + neither ``_comp`` nor ``_solo`` are skipped. + """ + annotation_dir = root / "annotation" + audio_dir = root / "audio_mono-mic" + if not annotation_dir.is_dir() or not audio_dir.is_dir(): + return [] + + entries: list[ClipEntry] = [] + for jams_path in sorted(annotation_dir.glob("*.jams")): + track_id = jams_path.stem + media_path = audio_dir / f"{track_id}_mic.wav" + if not media_path.is_file(): + continue + tier = _guitarset_tier(track_id) + if tier is None: + continue + entries.append( + ClipEntry( + id=f"guitarset/{track_id}", + tier=tier, + source="GuitarSet", + split=_guitarset_split(track_id, validation_player), + media_path=str(media_path.resolve()), + annotation_path=str(jams_path.resolve()), + annotation_format="guitarset_jams", + ) + ) + return entries + + +def scan_guitar_techs(root: Path) -> list[ClipEntry]: + """Scan a Guitar-TECHS directory tree. + + Returns ``[]`` until the dataset is acquired locally and the + on-disk layout (per arXiv:2501.03720) is verified. The strategy + doc §3.1 marks Guitar-TECHS as an acquisition item; once the + bytes are on disk we can populate this scanner in a follow-up + commit. + """ + del root + return [] + + +def apply_limits( + entries: Iterable[ClipEntry], + *, + max_clips_per_tier: int | None = None, + total_limit: int | None = None, +) -> list[ClipEntry]: + """Apply per-tier and total limits deterministically. + + Entries are first sorted by ``id`` (so the same data produces the + same output regardless of input scan order), then per-tier capped, + then total-limited. + """ + sorted_entries = sorted(entries, key=lambda entry: entry.id) + + if max_clips_per_tier is not None and max_clips_per_tier >= 0: + by_tier: dict[str, int] = {} + capped: list[ClipEntry] = [] + for entry in sorted_entries: + count = by_tier.get(entry.tier, 0) + if count >= max_clips_per_tier: + continue + capped.append(entry) + by_tier[entry.tier] = count + 1 + sorted_entries = capped + + if total_limit is not None and 0 <= total_limit < len(sorted_entries): + sorted_entries = sorted_entries[:total_limit] + + return sorted_entries + + +def _toml_escape(value: str) -> str: + """Escape a TOML basic-string value (backslashes + double quotes).""" + return value.replace("\\", "\\\\").replace('"', '\\"') + + +def _relativize_to_data_root(path_str: str, data_root: Path | None) -> str: + """Rewrite ``path_str`` as ``$TABVISION_DATA_ROOT/`` when it lives + under ``data_root``. Returns the original string when ``data_root`` is + ``None`` or the path isn't under it. + + The composite-eval CLI expands ``$TABVISION_DATA_ROOT`` at eval time + via the env var or its ``--media-root`` / ``--annotation-root`` args + (see :func:`tabvision.eval.composite._resolve_path`), so this keeps + checked-in manifests portable across developer machines. + """ + if data_root is None: + return path_str + abs_root = str(data_root.expanduser().resolve()) + if path_str == abs_root: + return "$TABVISION_DATA_ROOT" + if path_str.startswith(abs_root + "/"): + rest = path_str[len(abs_root) + 1 :] + return f"$TABVISION_DATA_ROOT/{rest}" + return path_str + + +def render_toml( + entries: Iterable[ClipEntry], + *, + header_comment: str = "", + data_root: Path | None = None, +) -> str: + """Render entries as a TOML composite manifest. + + Output is sorted by clip id for byte-stable re-generation. When + ``data_root`` is provided, ``media_path`` and ``annotation_path`` + values that fall under that root are rewritten as + ``$TABVISION_DATA_ROOT/`` — the composite-eval CLI expands + that token at eval time. Use this for checked-in manifests. + """ + sorted_entries = sorted(entries, key=lambda entry: entry.id) + lines: list[str] = [] + if header_comment: + for raw_line in header_comment.splitlines(): + lines.append(f"# {raw_line}" if raw_line else "#") + lines.append("") + fields = ( + "id", + "tier", + "source", + "split", + "media_path", + "annotation_path", + "annotation_format", + ) + for entry in sorted_entries: + lines.append("[[clips]]") + for field in fields: + raw = getattr(entry, field) + if field in ("media_path", "annotation_path"): + raw = _relativize_to_data_root(raw, data_root) + value = _toml_escape(raw) + lines.append(f'{field} = "{value}"') + lines.append("") + return "\n".join(lines).rstrip() + "\n" + + +def summarise_coverage(entries: Iterable[ClipEntry]) -> str: + """Human-readable coverage summary.""" + entries_list = list(entries) + by_tier: dict[str, dict[str, int]] = {} + by_split: dict[str, int] = {} + for entry in entries_list: + by_tier.setdefault(entry.tier, {}).setdefault(entry.source, 0) + by_tier[entry.tier][entry.source] += 1 + by_split[entry.split] = by_split.get(entry.split, 0) + 1 + + lines: list[str] = [] + lines.append(f"Total clips: {len(entries_list)}") + lines.append("Per-tier × source:") + for tier in sorted(by_tier): + per_source = ", ".join( + f"{source}={count}" for source, count in sorted(by_tier[tier].items()) + ) + total = sum(by_tier[tier].values()) + lines.append(f" {tier}: {total} clips ({per_source})") + if by_split: + split_summary = ", ".join( + f"{split}={count}" for split, count in sorted(by_split.items()) + ) + lines.append(f"Splits: {split_summary}") + return "\n".join(lines) + + +def _refuse_synthetic_in_eval_splits(entries: Iterable[ClipEntry]) -> None: + """Pre-write guard: bail loudly on bad synthetic-source manifests.""" + for entry in entries: + if entry.split == "train": + continue + source = entry.source.lower() + if any(source.startswith(prefix) for prefix in SYNTHETIC_SOURCE_PREFIXES): + raise ValueError( + f"Clip {entry.id!r} has synthetic source {entry.source!r} but " + f"split={entry.split!r}; the manifest validator (and design " + f"plan §5 R8) forbid synthetic-source clips in eval splits. " + f"Either move to split='train' or remove." + ) + + +def build_manifest( + *, + guitarset_root: Path | None = None, + guitar_techs_root: Path | None = None, + splits: tuple[str, ...] | None = None, + max_clips_per_tier: int | None = None, + total_limit: int | None = None, + validation_player: str = GUITARSET_VALIDATION_PLAYER, +) -> list[ClipEntry]: + """Scan all configured roots and apply filters + limits. + + Sources whose root is ``None`` or doesn't exist are silently skipped. + Optional ``splits`` restricts to the named splits (e.g. + ``("validation",)`` for a smoke pre-flight). Limits are applied + after the split filter, sorted by clip id for determinism. + """ + entries: list[ClipEntry] = [] + if guitarset_root is not None: + entries.extend( + scan_guitarset(guitarset_root, validation_player=validation_player) + ) + if guitar_techs_root is not None: + entries.extend(scan_guitar_techs(guitar_techs_root)) + + _refuse_synthetic_in_eval_splits(entries) + + if splits is not None: + allowed = set(splits) + entries = [entry for entry in entries if entry.split in allowed] + + return apply_limits( + entries, + max_clips_per_tier=max_clips_per_tier, + total_limit=total_limit, + ) + + +def main(argv: list[str] | None = None) -> int: + """CLI entry point: ``tabvision-build-composite-manifest``.""" + parser = argparse.ArgumentParser( + prog="build_composite_manifest", + description=( + "Scan dataset roots on disk and emit a composite-eval TOML manifest." + ), + ) + parser.add_argument( + "--guitarset", + type=Path, + default=None, + help="GuitarSet root directory (with annotation/ and audio_mono-mic/)", + ) + parser.add_argument( + "--guitar-techs", + type=Path, + default=None, + help="Guitar-TECHS root directory (scanner is currently a stub)", + ) + parser.add_argument("--output", type=Path, required=True) + parser.add_argument( + "--max-clips-per-tier", + type=int, + default=None, + help="cap clips per tier; useful for smoke runs", + ) + parser.add_argument( + "--limit", + type=int, + default=None, + help="cap total clips after per-tier cap; useful for smoke runs", + ) + parser.add_argument( + "--guitarset-validation-player", + default=GUITARSET_VALIDATION_PLAYER, + help="GuitarSet player id whose tracks go into the validation split", + ) + parser.add_argument( + "--splits", + default=None, + help=( + "comma-separated splits to include (e.g. 'validation' for a " + "smoke pre-flight). Default: include all splits." + ), + ) + parser.add_argument( + "--data-root", + type=Path, + default=None, + help=( + "rewrite media/annotation paths that fall under this root as " + "$TABVISION_DATA_ROOT/ for portable checked-in manifests" + ), + ) + + args = parser.parse_args(argv) + + if args.guitarset is None and args.guitar_techs is None: + parser.error("specify at least one of --guitarset or --guitar-techs") + + splits_filter: tuple[str, ...] | None = None + if args.splits: + splits_filter = tuple(s.strip() for s in args.splits.split(",") if s.strip()) + + try: + entries = build_manifest( + guitarset_root=args.guitarset, + guitar_techs_root=args.guitar_techs, + splits=splits_filter, + max_clips_per_tier=args.max_clips_per_tier, + total_limit=args.limit, + validation_player=args.guitarset_validation_player, + ) + except ValueError as exc: + print(f"error: {exc}", flush=True) + return 2 + + if not entries: + print( + "No clips discovered. Check --guitarset / --guitar-techs paths.", + flush=True, + ) + return 1 + + header = ( + "Composite-eval manifest generated by " + "tabvision/scripts/eval/build_composite_manifest.py." + "\nRe-generate with the same args to refresh; this file is " + "intended to be auto-managed." + ) + args.output.parent.mkdir(parents=True, exist_ok=True) + args.output.write_text( + render_toml(entries, header_comment=header, data_root=args.data_root), + encoding="utf-8", + ) + + print(f"Wrote {len(entries)} clips to {args.output}", flush=True) + print(summarise_coverage(entries), flush=True) + + validation: ManifestValidation = validate_manifest(args.output) + fail_items = [item for item in validation.items if item.severity == "fail"] + if fail_items: + print(f"\nValidation FAILED with {len(fail_items)} issue(s):", flush=True) + for item in fail_items: + print(f" [{item.code}] {item.message}", flush=True) + return 2 + + print("\nManifest validation passed.", flush=True) + return 0 + + +__all__ = [ + "ClipEntry", + "GUITARSET_VALIDATION_PLAYER", + "apply_limits", + "build_manifest", + "main", + "render_toml", + "scan_guitar_techs", + "scan_guitarset", + "summarise_coverage", +] + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/tabvision/tabvision/eval/metrics.py b/tabvision/tabvision/eval/metrics.py index 92fd24f..d30042a 100644 --- a/tabvision/tabvision/eval/metrics.py +++ b/tabvision/tabvision/eval/metrics.py @@ -164,9 +164,81 @@ def _cluster_by_gap(events: Sequence[TabEvent], gap_s: float) -> list[list[TabEv return clusters +@dataclass(frozen=True) +class EventF1Result: + """Onset-only or onset+pitch F1 over two ``TabEvent`` sequences. + + Mirrors the structure of :class:`TabF1Result` but represents the + looser matchers used to track audio-side performance independent + of string/fret assignment. + """ + + precision: float + recall: float + f1: float + true_positives: int + false_positives: int + false_negatives: int + + +def event_f1( + predicted: Sequence[TabEvent], + gold: Sequence[TabEvent], + *, + match_pitch: bool = True, + onset_tolerance_s: float = 0.05, +) -> EventF1Result: + """F1 over predicted-vs-gold events on onset (optionally + pitch). + + With ``match_pitch=False`` this is onset F1 (SPEC §1.4 line 1). + With ``match_pitch=True`` (default) it is pitch F1 (SPEC §1.4 line 2). + String / fret agreement is ignored — that is what :func:`tab_f1` is for. + """ + pred_sorted = sorted(predicted, key=lambda t: t.onset_s) + gold_sorted = sorted(gold, key=lambda t: t.onset_s) + gold_used = [False] * len(gold_sorted) + tp = 0 + fp = 0 + for p in pred_sorted: + best_j = -1 + best_dt = onset_tolerance_s + 1e-9 + for j, g in enumerate(gold_sorted): + if gold_used[j]: + continue + if match_pitch and g.pitch_midi != p.pitch_midi: + continue + dt = abs(g.onset_s - p.onset_s) + if dt <= onset_tolerance_s and dt < best_dt: + best_j = j + best_dt = dt + if best_j >= 0: + gold_used[best_j] = True + tp += 1 + else: + fp += 1 + fn = sum(1 for used in gold_used if not used) + precision = tp / (tp + fp) if (tp + fp) > 0 else 0.0 + recall = tp / (tp + fn) if (tp + fn) > 0 else 0.0 + f1 = ( + 2 * precision * recall / (precision + recall) + if (precision + recall) > 0 + else 0.0 + ) + return EventF1Result( + precision=precision, + recall=recall, + f1=f1, + true_positives=tp, + false_positives=fp, + false_negatives=fn, + ) + + __all__ = [ - "TabF1Result", "ChordAccuracyResult", - "tab_f1", + "EventF1Result", + "TabF1Result", "chord_instance_accuracy", + "event_f1", + "tab_f1", ] diff --git a/tabvision/tabvision/eval/parsers/__init__.py b/tabvision/tabvision/eval/parsers/__init__.py new file mode 100644 index 0000000..656e8a8 --- /dev/null +++ b/tabvision/tabvision/eval/parsers/__init__.py @@ -0,0 +1,31 @@ +"""Annotation parsers — uniform interface for source-specific tab labels. + +Each parser module exposes: + +- ``FORMAT_NAME``: the string key that appears in + ``Manifest.clip.annotation_format`` (added in Phase 0 to support + multi-source composite eval). +- ``parse(annotation_path, cfg) -> list[TabEvent]``: pure function; + no I/O outside the file at ``annotation_path``. + +Submodule imports below trigger registration in +:mod:`tabvision.eval.parsers.registry`. +""" + +# Built-in parsers — importing them registers their FORMAT_NAME. +from tabvision.eval.parsers import guitar_techs_midi, guitarset_jams # noqa: F401 +from tabvision.eval.parsers.registry import ( + ParserFn, + clear_parsers, + get_parser, + list_parsers, + register_parser, +) + +__all__ = [ + "ParserFn", + "clear_parsers", + "get_parser", + "list_parsers", + "register_parser", +] diff --git a/tabvision/tabvision/eval/parsers/guitar_techs_midi.py b/tabvision/tabvision/eval/parsers/guitar_techs_midi.py new file mode 100644 index 0000000..69b0cbd --- /dev/null +++ b/tabvision/tabvision/eval/parsers/guitar_techs_midi.py @@ -0,0 +1,84 @@ +"""Guitar-TECHS 6-track MIDI annotation parser. + +Per arXiv:2501.03720 §3, Guitar-TECHS distributes one MIDI file per +clip with six instrument tracks, each carrying the notes for one +guitar string. The default ordering is low E → high E, matching the +:class:`tabvision.types.GuitarConfig` ``tuning_midi`` convention +(low E = ``string_idx`` 0). + +If a particular Guitar-TECHS release uses a different track ordering, +pass ``track_to_string`` to ``parse`` directly; manifest-level support +for parser arguments is deferred to a later phase. +""" + +from __future__ import annotations + +from pathlib import Path + +from tabvision.eval.parsers.registry import register_parser +from tabvision.types import GuitarConfig, TabEvent + +FORMAT_NAME = "guitar_techs_midi" + +DEFAULT_TRACK_TO_STRING: tuple[int, ...] = (0, 1, 2, 3, 4, 5) +"""Track-index → ``string_idx`` mapping; default = identity (low E first).""" + + +def parse( + midi_path: str | Path, + cfg: GuitarConfig | None = None, + *, + track_to_string: tuple[int, ...] = DEFAULT_TRACK_TO_STRING, +) -> list[TabEvent]: + """Parse Guitar-TECHS MIDI into v1 :class:`TabEvent` gold notes. + + Pitch ``p`` on the track mapped to string ``s`` is assigned + ``fret = p - cfg.tuning_midi[s]``. Notes that would imply a fret + below ``cfg.capo`` or above ``cfg.max_fret`` are dropped. + """ + try: + import pretty_midi # noqa: PLC0415 + except ImportError as exc: # pragma: no cover - skip path + raise ImportError( + "guitar_techs_midi parser requires pretty_midi. Install with: " + "pip install -e 'tabvision[audio-highres]'" + ) from exc + + if cfg is None: + cfg = GuitarConfig() + + midi = pretty_midi.PrettyMIDI(str(midi_path)) + + out: list[TabEvent] = [] + for track_index, instrument in enumerate(midi.instruments): + if track_index >= len(track_to_string): + break + string_idx = track_to_string[track_index] + if not 0 <= string_idx < cfg.n_strings: + continue + + open_pitch = cfg.tuning_midi[string_idx] + for note in instrument.notes: + pitch_midi = int(note.pitch) + fret = pitch_midi - open_pitch + if fret < cfg.capo or fret > cfg.max_fret: + continue + out.append( + TabEvent( + onset_s=float(note.start), + duration_s=float(max(0.0, note.end - note.start)), + string_idx=string_idx, + fret=fret, + pitch_midi=pitch_midi, + confidence=1.0, + ) + ) + + out.sort(key=lambda ev: (ev.onset_s, ev.string_idx, ev.fret)) + return out + + +register_parser(FORMAT_NAME, parse) + + +__all__ = ["DEFAULT_TRACK_TO_STRING", "FORMAT_NAME", "parse"] diff --git a/tabvision/tabvision/eval/parsers/guitarset_jams.py b/tabvision/tabvision/eval/parsers/guitarset_jams.py new file mode 100644 index 0000000..566d2cb --- /dev/null +++ b/tabvision/tabvision/eval/parsers/guitarset_jams.py @@ -0,0 +1,18 @@ +"""GuitarSet JAMS annotation parser. + +Wraps the existing :func:`tabvision.eval.guitarset_audio.parse_guitarset_jams` +under the uniform parser interface so composite-eval dispatch can route +``annotation_format = "guitarset_jams"`` clips here. +""" + +from __future__ import annotations + +from tabvision.eval.guitarset_audio import parse_guitarset_jams as parse +from tabvision.eval.parsers.registry import register_parser + +FORMAT_NAME = "guitarset_jams" + +register_parser(FORMAT_NAME, parse) + + +__all__ = ["FORMAT_NAME", "parse"] diff --git a/tabvision/tabvision/eval/parsers/registry.py b/tabvision/tabvision/eval/parsers/registry.py new file mode 100644 index 0000000..99a29de --- /dev/null +++ b/tabvision/tabvision/eval/parsers/registry.py @@ -0,0 +1,69 @@ +"""Annotation-parser registry. + +Each annotation source (GuitarSet JAMS, Guitar-TECHS 6-track MIDI, EGDB +GuitarPro, etc.) gets a parser module that registers itself here on +import. Composite-eval dispatch then routes by +``Manifest.clip.annotation_format`` to the registered parser. + +This file is import-side-effect free: the registry is empty at first +import. Built-in parsers are registered by ``parsers/__init__.py`` +importing their submodules. +""" + +from __future__ import annotations + +from collections.abc import Callable +from pathlib import Path + +from tabvision.types import GuitarConfig, TabEvent + +ParserFn = Callable[[str | Path, GuitarConfig | None], list[TabEvent]] +"""``(annotation_path, cfg) -> list[TabEvent]``. ``cfg`` may be ``None``.""" + + +_PARSERS: dict[str, ParserFn] = {} + + +def register_parser(format_name: str, fn: ParserFn) -> None: + """Register ``fn`` as the parser for ``format_name``. + + Raises ``ValueError`` if ``format_name`` is already registered. + """ + if format_name in _PARSERS: + raise ValueError( + f"Parser already registered for format {format_name!r}; " + f"call clear_parsers() first if this is intentional." + ) + _PARSERS[format_name] = fn + + +def get_parser(format_name: str) -> ParserFn: + """Look up the parser for ``format_name``. + + Raises ``KeyError`` with the list of known formats if not registered. + """ + if format_name not in _PARSERS: + known = ", ".join(sorted(_PARSERS)) or "(none registered)" + raise KeyError( + f"Unknown annotation format: {format_name!r}. Known: {known}." + ) + return _PARSERS[format_name] + + +def list_parsers() -> list[str]: + """Return the sorted list of registered format names.""" + return sorted(_PARSERS) + + +def clear_parsers() -> None: + """Remove all registered parsers. For tests only.""" + _PARSERS.clear() + + +__all__ = [ + "ParserFn", + "clear_parsers", + "get_parser", + "list_parsers", + "register_parser", +] diff --git a/tabvision/tests/integration/test_composite_eval_smoke.py b/tabvision/tests/integration/test_composite_eval_smoke.py new file mode 100644 index 0000000..63faa13 --- /dev/null +++ b/tabvision/tests/integration/test_composite_eval_smoke.py @@ -0,0 +1,486 @@ +"""Integration smoke tests for the composite-eval harness (Phase 0).""" + +from __future__ import annotations + +import json +from pathlib import Path + +import pytest + +from tabvision.eval.composite import ( + Predictor, + run_composite_eval, +) +from tabvision.types import SessionConfig, TabEvent + +# Standard tuning open pitches for derived MIDI. +_OPEN_PITCH = (40, 45, 50, 55, 59, 64) + + +def _write_jams( + path: Path, + notes: list[tuple[float, float, int, int]], +) -> None: + """Write a minimal GuitarSet-style JAMS at ``path``. + + Each ``notes`` tuple is ``(onset_s, duration_s, string_idx, fret)``. + """ + by_string: dict[int, list[dict[str, float]]] = {} + for onset, duration, string_idx, fret in notes: + midi = _OPEN_PITCH[string_idx] + fret + by_string.setdefault(string_idx, []).append( + {"time": float(onset), "duration": float(duration), "value": float(midi)} + ) + payload = { + "annotations": [ + { + "namespace": "note_midi", + "annotation_metadata": {"data_source": str(string_idx)}, + "data": data, + } + for string_idx, data in sorted(by_string.items()) + ] + } + path.write_text(json.dumps(payload), encoding="utf-8") + + +def _tab_event(onset: float, duration: float, string_idx: int, fret: int) -> TabEvent: + return TabEvent( + onset_s=onset, + duration_s=duration, + string_idx=string_idx, + fret=fret, + pitch_midi=_OPEN_PITCH[string_idx] + fret, + confidence=1.0, + ) + + +def _write_manifest( + manifest_path: Path, + entries: list[dict[str, str]], +) -> None: + """Build a TOML manifest from a list of clip-dict entries.""" + lines: list[str] = [] + for entry in entries: + lines.append("[[clips]]") + for key, value in entry.items(): + lines.append(f'{key} = "{value}"') + lines.append("") + manifest_path.write_text("\n".join(lines), encoding="utf-8") + + +def _make_predictor(gold_by_path: dict[str, list[TabEvent]]) -> Predictor: + """Return a predictor that echoes gold for each known path.""" + + def predict(media_path: Path, session: SessionConfig) -> list[TabEvent]: + del session + key = str(media_path) + if key not in gold_by_path: + raise KeyError(f"unknown media path in test: {key}") + return list(gold_by_path[key]) + + return predict + + +def _shifted_predictor(gold_by_path: dict[str, list[TabEvent]]) -> Predictor: + """Return a predictor that shifts every event to a different string with the same pitch.""" + + def predict(media_path: Path, session: SessionConfig) -> list[TabEvent]: + del session + gold = gold_by_path[str(media_path)] + out: list[TabEvent] = [] + for event in gold: + for candidate_string in range(6): + if candidate_string == event.string_idx: + continue + fret = event.pitch_midi - _OPEN_PITCH[candidate_string] + if 0 <= fret <= 24: + out.append( + TabEvent( + onset_s=event.onset_s, + duration_s=event.duration_s, + string_idx=candidate_string, + fret=fret, + pitch_midi=event.pitch_midi, + confidence=event.confidence, + ) + ) + break + return out + + return predict + + +def _build_two_tier_manifest(tmp_path: Path) -> tuple[Path, dict[str, list[TabEvent]]]: + """Two clips in clean_acoustic_strummed + one in clean_acoustic_single_line. + + Returns (manifest_path, gold_by_media_path). + """ + # Mid-range pitches so the shifted_predictor in tests below can find a + # legal alternate string (low pitches like low-E fret 3 can only live on + # string 0; shifting them yields no prediction). + clips = [ + ( + "guitarset-strum-01", + "clean_acoustic_strummed", + [(0.0, 0.5, 0, 7), (0.0, 0.5, 1, 7), (0.0, 0.5, 2, 7)], + ), + ( + "guitarset-strum-02", + "clean_acoustic_strummed", + [(1.0, 0.4, 3, 5), (1.5, 0.4, 4, 5)], + ), + ( + "guitarset-single-01", + "clean_acoustic_single_line", + [(0.0, 0.2, 2, 5), (0.5, 0.2, 2, 7), (1.0, 0.2, 2, 9)], + ), + ] + + gold_by_path: dict[str, list[TabEvent]] = {} + entries: list[dict[str, str]] = [] + for clip_id, tier, notes in clips: + jams_path = tmp_path / f"{clip_id}.jams" + media_path = tmp_path / f"{clip_id}.wav" + media_path.write_bytes(b"") # zero-byte placeholder; predictor doesn't read it + _write_jams(jams_path, notes) + gold_by_path[str(media_path)] = [ + _tab_event(o, d, s, f) for (o, d, s, f) in notes + ] + entries.append( + { + "id": clip_id, + "tier": tier, + "source": "GuitarSet", + "split": "validation", + "media_path": str(media_path), + "annotation_path": str(jams_path), + "annotation_format": "guitarset_jams", + } + ) + + manifest_path = tmp_path / "composite.toml" + _write_manifest(manifest_path, entries) + return manifest_path, gold_by_path + + +def test_perfect_predictor_yields_pass_on_both_tiers(tmp_path: Path) -> None: + manifest_path, gold_by_path = _build_two_tier_manifest(tmp_path) + predictor = _make_predictor(gold_by_path) + + report = run_composite_eval( + manifest_path, + predictor=predictor, + bootstrap_n=500, + bootstrap_seed=42, + ) + + assert set(report.tiers) == { + "clean_acoustic_strummed", + "clean_acoustic_single_line", + } + for tier, tier_report in report.tiers.items(): + assert tier_report.tab_f1.statistic == pytest.approx(1.0), ( + f"tier {tier} should be perfect with echo predictor" + ) + assert tier_report.onset_f1.statistic == pytest.approx(1.0) + assert tier_report.pitch_f1.statistic == pytest.approx(1.0) + + +def test_acceptance_helper_classifies_pass_gap_fail(tmp_path: Path) -> None: + manifest_path, gold_by_path = _build_two_tier_manifest(tmp_path) + report = run_composite_eval( + manifest_path, + predictor=_make_predictor(gold_by_path), + bootstrap_n=500, + ) + + targets = { + "clean_acoustic_strummed": 0.90, + "clean_acoustic_single_line": 0.85, + "clean_electric": 0.87, # not in manifest + } + statuses = report.tab_f1_acceptance(targets) + assert statuses["clean_acoustic_strummed"] == "pass" + assert statuses["clean_acoustic_single_line"] == "pass" + assert statuses["clean_electric"] == "missing" + + +def test_shifted_predictor_populates_wrong_position_bucket(tmp_path: Path) -> None: + """Every prediction same-pitch different-string → fills wrong_position_same_pitch.""" + manifest_path, gold_by_path = _build_two_tier_manifest(tmp_path) + predictor = _shifted_predictor(gold_by_path) + + report = run_composite_eval( + manifest_path, + predictor=predictor, + bootstrap_n=500, + ) + + strum = report.tiers["clean_acoustic_strummed"].errors + # All predictions are pitch-correct but position-wrong: zero correct, + # all in the wrong_position bucket. + assert strum.correct == 0 + assert strum.wrong_position_same_pitch > 0 + assert strum.pitch_off == 0 + assert strum.missed_onset == 0 + + +def test_train_clips_skipped_by_default(tmp_path: Path) -> None: + """A train-split clip should not appear in per_clip results.""" + jams_path = tmp_path / "train.jams" + media_path = tmp_path / "train.wav" + media_path.write_bytes(b"") + _write_jams(jams_path, [(0.0, 0.2, 0, 0)]) + + manifest_path = tmp_path / "composite.toml" + _write_manifest( + manifest_path, + [ + { + "id": "train-01", + "tier": "clean_acoustic_single_line", + "source": "GuitarSet", + "split": "train", + "media_path": str(media_path), + "annotation_path": str(jams_path), + "annotation_format": "guitarset_jams", + } + ], + ) + + report = run_composite_eval( + manifest_path, + predictor=_make_predictor({}), + bootstrap_n=100, + ) + + assert report.per_clip == [] + assert report.tiers == {} + + +def test_explicit_train_split_includes_train_clips(tmp_path: Path) -> None: + jams_path = tmp_path / "train.jams" + media_path = tmp_path / "train.wav" + media_path.write_bytes(b"") + notes = [(0.0, 0.2, 0, 0)] + _write_jams(jams_path, notes) + + manifest_path = tmp_path / "composite.toml" + _write_manifest( + manifest_path, + [ + { + "id": "train-01", + "tier": "clean_acoustic_single_line", + "source": "GuitarSet", + "split": "train", + "media_path": str(media_path), + "annotation_path": str(jams_path), + "annotation_format": "guitarset_jams", + } + ], + ) + + gold = {str(media_path): [_tab_event(o, d, s, f) for (o, d, s, f) in notes]} + report = run_composite_eval( + manifest_path, + predictor=_make_predictor(gold), + splits=("train",), + bootstrap_n=100, + ) + + assert len(report.per_clip) == 1 + assert report.per_clip[0].clip_id == "train-01" + + +def test_rejects_manifest_with_fail_issues(tmp_path: Path) -> None: + """Missing required field (annotation_format) should block the eval.""" + jams_path = tmp_path / "clip.jams" + media_path = tmp_path / "clip.wav" + media_path.write_bytes(b"") + _write_jams(jams_path, [(0.0, 0.2, 0, 0)]) + + manifest_path = tmp_path / "composite.toml" + _write_manifest( + manifest_path, + [ + { + "id": "clip-no-format", + "tier": "clean_acoustic_single_line", + "source": "GuitarSet", + "split": "validation", + "media_path": str(media_path), + "annotation_path": str(jams_path), + # annotation_format intentionally omitted + } + ], + ) + + with pytest.raises(ValueError, match="fail-severity"): + run_composite_eval( + manifest_path, + predictor=_make_predictor({}), + bootstrap_n=100, + ) + + +def test_unknown_parser_format_raises(tmp_path: Path) -> None: + """A manifest referencing an unregistered parser should raise KeyError at dispatch.""" + jams_path = tmp_path / "clip.jams" + media_path = tmp_path / "clip.wav" + media_path.write_bytes(b"") + _write_jams(jams_path, [(0.0, 0.2, 0, 0)]) + + manifest_path = tmp_path / "composite.toml" + _write_manifest( + manifest_path, + [ + { + "id": "weird", + "tier": "clean_acoustic_single_line", + "source": "Unknown", + "split": "validation", + "media_path": str(media_path), + "annotation_path": str(jams_path), + "annotation_format": "non_existent_format", + } + ], + ) + + with pytest.raises(KeyError, match="non_existent_format"): + run_composite_eval( + manifest_path, + predictor=_make_predictor({}), + bootstrap_n=100, + ) + + +def test_data_root_substitution_uses_env_var( + tmp_path: Path, + monkeypatch: pytest.MonkeyPatch, +) -> None: + """$TABVISION_DATA_ROOT in paths is expanded via env var when no override.""" + data_root = tmp_path / "data" + data_root.mkdir() + jams_path = data_root / "clip.jams" + media_path = data_root / "clip.wav" + media_path.write_bytes(b"") + _write_jams(jams_path, [(0.0, 0.2, 0, 0)]) + + manifest_path = tmp_path / "composite.toml" + _write_manifest( + manifest_path, + [ + { + "id": "with-root", + "tier": "clean_acoustic_single_line", + "source": "GuitarSet", + "split": "validation", + "media_path": "$TABVISION_DATA_ROOT/clip.wav", + "annotation_path": "$TABVISION_DATA_ROOT/clip.jams", + "annotation_format": "guitarset_jams", + } + ], + ) + + monkeypatch.setenv("TABVISION_DATA_ROOT", str(data_root)) + gold = {str(media_path): [_tab_event(0.0, 0.2, 0, 0)]} + + report = run_composite_eval( + manifest_path, + predictor=_make_predictor(gold), + bootstrap_n=100, + ) + + assert len(report.per_clip) == 1 + + +def test_data_root_substitution_uses_function_arg( + tmp_path: Path, + monkeypatch: pytest.MonkeyPatch, +) -> None: + """``annotation_root`` arg overrides the env var.""" + real_root = tmp_path / "real" + real_root.mkdir() + jams_path = real_root / "clip.jams" + media_path = real_root / "clip.wav" + media_path.write_bytes(b"") + _write_jams(jams_path, [(0.0, 0.2, 0, 0)]) + + manifest_path = tmp_path / "composite.toml" + _write_manifest( + manifest_path, + [ + { + "id": "rooted", + "tier": "clean_acoustic_single_line", + "source": "GuitarSet", + "split": "validation", + "media_path": "$TABVISION_DATA_ROOT/clip.wav", + "annotation_path": "$TABVISION_DATA_ROOT/clip.jams", + "annotation_format": "guitarset_jams", + } + ], + ) + + monkeypatch.setenv("TABVISION_DATA_ROOT", "/nonexistent") + gold = {str(media_path): [_tab_event(0.0, 0.2, 0, 0)]} + + report = run_composite_eval( + manifest_path, + predictor=_make_predictor(gold), + media_root=str(real_root), + annotation_root=str(real_root), + bootstrap_n=100, + ) + + assert len(report.per_clip) == 1 + + +def test_per_clip_metrics_include_error_decomposition(tmp_path: Path) -> None: + """Each ClipEvalResult should carry the six-bucket decomposition.""" + manifest_path, gold_by_path = _build_two_tier_manifest(tmp_path) + report = run_composite_eval( + manifest_path, + predictor=_make_predictor(gold_by_path), + bootstrap_n=100, + ) + + for clip_result in report.per_clip: + # Echo predictor → all gold notes should be correct + assert clip_result.errors.correct == clip_result.n_gold + assert clip_result.errors.total_loss == 0 + + +def test_clip_with_no_gold_or_predictions(tmp_path: Path) -> None: + """Empty-gold clip should not break aggregation; F1 is 0 by convention.""" + jams_path = tmp_path / "empty.jams" + jams_path.write_text(json.dumps({"annotations": []}), encoding="utf-8") + media_path = tmp_path / "empty.wav" + media_path.write_bytes(b"") + + manifest_path = tmp_path / "composite.toml" + _write_manifest( + manifest_path, + [ + { + "id": "empty-clip", + "tier": "clean_acoustic_single_line", + "source": "GuitarSet", + "split": "validation", + "media_path": str(media_path), + "annotation_path": str(jams_path), + "annotation_format": "guitarset_jams", + } + ], + ) + + report = run_composite_eval( + manifest_path, + predictor=_make_predictor({str(media_path): []}), + bootstrap_n=100, + ) + + assert len(report.per_clip) == 1 + assert report.per_clip[0].tab.f1 == 0.0 diff --git a/tabvision/tests/unit/test_bootstrap_ci.py b/tabvision/tests/unit/test_bootstrap_ci.py new file mode 100644 index 0000000..0b71ca7 --- /dev/null +++ b/tabvision/tests/unit/test_bootstrap_ci.py @@ -0,0 +1,111 @@ +"""Tests for the bootstrap-CI helper (Phase 0).""" + +from __future__ import annotations + +import numpy as np +import pytest + +from tabvision.eval.bootstrap import BootstrapResult, bootstrap_ci + + +def test_returns_bootstrap_result_type(): + r = bootstrap_ci([0.5, 0.6, 0.7]) + assert isinstance(r, BootstrapResult) + assert r.n_observations == 3 + assert r.n_bootstrap == 10_000 + assert r.confidence == 0.95 + + +def test_deterministic_with_seed(): + values = [0.10, 0.50, 0.90, 0.60, 0.30, 0.80] + r1 = bootstrap_ci(values, seed=42) + r2 = bootstrap_ci(values, seed=42) + assert r1.statistic == r2.statistic + assert r1.lower == r2.lower + assert r1.upper == r2.upper + + +def test_different_seeds_produce_different_intervals(): + values = [0.10, 0.50, 0.90, 0.60, 0.30, 0.80] + r1 = bootstrap_ci(values, seed=42) + r2 = bootstrap_ci(values, seed=43) + # CI endpoints may coincide on small data; require at least one to differ. + assert (r1.lower != r2.lower) or (r1.upper != r2.upper) + + +def test_single_observation_has_zero_width_ci(): + r = bootstrap_ci([0.85]) + assert r.statistic == pytest.approx(0.85) + assert r.lower == r.statistic == r.upper + assert r.n_observations == 1 + assert r.n_bootstrap == 0 + + +def test_rejects_empty_values(): + with pytest.raises(ValueError, match="at least one observation"): + bootstrap_ci([]) + + +@pytest.mark.parametrize("bad_conf", [0.0, 1.0, -0.1, 1.5]) +def test_rejects_bad_confidence(bad_conf): + with pytest.raises(ValueError, match="confidence"): + bootstrap_ci([0.5, 0.6], confidence=bad_conf) + + +def test_rejects_zero_bootstrap(): + with pytest.raises(ValueError, match="n_bootstrap"): + bootstrap_ci([0.5, 0.6], n_bootstrap=0) + + +def test_accepts_numpy_array(): + arr = np.array([0.1, 0.5, 0.9]) + r = bootstrap_ci(arr) + assert r.statistic == pytest.approx(0.5) + assert r.n_observations == 3 + + +def test_custom_statistic(): + """Verify a non-mean statistic is honored.""" + values = [1.0, 2.0, 3.0, 4.0, 5.0] + r_median = bootstrap_ci(values, statistic=np.median, seed=0) + r_mean = bootstrap_ci(values, statistic=np.mean, seed=0) + # On this small sample they may coincide; correctness check is that + # statistic is honored, not that they differ. + assert r_median.statistic == pytest.approx(3.0) + assert r_mean.statistic == pytest.approx(3.0) + + +def test_lower_le_statistic_le_upper(): + values = [0.1, 0.3, 0.5, 0.7, 0.9, 0.2, 0.4, 0.6, 0.8] + r = bootstrap_ci(values, seed=7) + assert r.lower <= r.statistic <= r.upper + + +def test_ci_brackets_known_normal_mean(): + """Coverage check: 95% CI should contain the true mean in roughly 95% of trials. + + Bootstrap percentile intervals are asymptotic — allow generous slack + so this isn't flaky. We require >= 88% coverage on a low-trial run + (200 trials, n_obs=80, n_bootstrap=500) for speed. + """ + rng = np.random.default_rng(0) + n_trials = 200 + n_obs = 80 + true_mean = 0.85 + sigma = 0.05 + hits = 0 + for trial in range(n_trials): + sample = rng.normal(true_mean, sigma, n_obs) + r = bootstrap_ci(sample, seed=trial, n_bootstrap=500) + if r.lower <= true_mean <= r.upper: + hits += 1 + coverage = hits / n_trials + assert coverage >= 0.88, f"bootstrap coverage {coverage:.3f} below 0.88" + + +def test_zero_variance_input_collapses_ci(): + """If every observation is identical, the CI is a point.""" + r = bootstrap_ci([0.5] * 10, seed=42) + assert r.statistic == pytest.approx(0.5) + assert r.lower == pytest.approx(0.5) + assert r.upper == pytest.approx(0.5) diff --git a/tabvision/tests/unit/test_composite_report_formatting.py b/tabvision/tests/unit/test_composite_report_formatting.py new file mode 100644 index 0000000..3a74b97 --- /dev/null +++ b/tabvision/tests/unit/test_composite_report_formatting.py @@ -0,0 +1,197 @@ +"""Smoke tests for the composite-eval markdown formatters (Phase 0).""" + +from __future__ import annotations + +from pathlib import Path + +import pytest + +from tabvision.eval.bootstrap import BootstrapResult +from tabvision.eval.composite import ( + DEFAULT_TIER_TARGETS, + ClipEvalResult, + CompositeReport, + TierReport, + format_baseline_markdown, + format_decomposition_markdown, +) +from tabvision.eval.error_decomposition import ErrorDecomposition +from tabvision.eval.manifest import ManifestValidation +from tabvision.eval.metrics import EventF1Result, TabF1Result + + +def _bootstrap(value: float, lower: float, upper: float) -> BootstrapResult: + return BootstrapResult( + statistic=value, + lower=lower, + upper=upper, + n_observations=20, + n_bootstrap=10_000, + confidence=0.95, + ) + + +def _event_f1(value: float) -> EventF1Result: + return EventF1Result( + precision=value, + recall=value, + f1=value, + true_positives=10, + false_positives=1, + false_negatives=1, + ) + + +def _tab_f1(value: float) -> TabF1Result: + return TabF1Result( + precision=value, + recall=value, + f1=value, + true_positives=10, + false_positives=1, + false_negatives=1, + ) + + +def _clip(tier: str, source: str, tab_value: float) -> ClipEvalResult: + return ClipEvalResult( + clip_id=f"{source}-{tier}-x", + tier=tier, + source=source, + n_gold=12, + n_predicted=11, + onset=_event_f1(0.95), + pitch=_event_f1(0.92), + tab=_tab_f1(tab_value), + errors=ErrorDecomposition( + correct=10, wrong_position_same_pitch=1, missed_onset=1 + ), + ) + + +def _report(tmp_path: Path) -> CompositeReport: + per_clip = [ + _clip("clean_acoustic_strummed", "GuitarSet", 0.92), + _clip("clean_acoustic_strummed", "GuitarSet", 0.94), + _clip("clean_acoustic_single_line", "GuitarSet", 0.62), + _clip("clean_acoustic_single_line", "Guitar-TECHS", 0.71), + ] + tiers = { + "clean_acoustic_strummed": TierReport( + tier="clean_acoustic_strummed", + n_clips=2, + n_gold_total=24, + onset_f1=_bootstrap(0.95, 0.93, 0.97), + pitch_f1=_bootstrap(0.92, 0.90, 0.94), + tab_f1=_bootstrap(0.93, 0.91, 0.95), + errors=ErrorDecomposition(correct=20, wrong_position_same_pitch=2), + ), + "clean_acoustic_single_line": TierReport( + tier="clean_acoustic_single_line", + n_clips=2, + n_gold_total=24, + onset_f1=_bootstrap(0.95, 0.92, 0.98), + pitch_f1=_bootstrap(0.92, 0.90, 0.95), + tab_f1=_bootstrap(0.665, 0.55, 0.78), # gap: mean > 0.85? no, fail + errors=ErrorDecomposition( + correct=10, wrong_position_same_pitch=10, missed_onset=4 + ), + ), + } + validation = ManifestValidation( + manifest_path=str(tmp_path / "manifest.toml"), + passed=True, + clip_count=4, + clip_ids=["a", "b", "c", "d"], + present_tiers=["clean_acoustic_single_line", "clean_acoustic_strummed"], + missing_tiers=["clean_electric", "distorted_electric"], + items=[], + ) + return CompositeReport( + manifest_path=str(tmp_path / "manifest.toml"), + manifest_validation=validation, + per_clip=per_clip, + tiers=tiers, + bootstrap_n=10_000, + bootstrap_seed=42, + onset_tolerance_s=0.05, + ) + + +def test_baseline_markdown_has_required_sections(tmp_path: Path) -> None: + md = format_baseline_markdown(_report(tmp_path)) + + assert "## Per-tier results" in md + assert "## Per-source breakdown" in md + assert "## Methodology" in md + for tier in DEFAULT_TIER_TARGETS: + assert tier in md + + +def test_baseline_markdown_status_column(tmp_path: Path) -> None: + """The status column must categorise as pass / gap / fail / missing.""" + md = format_baseline_markdown(_report(tmp_path)) + + # clean_acoustic_strummed: lower_95 = 0.91 >= 0.90 target → pass + strum_row = next( + line for line in md.split("\n") if line.startswith("| clean_acoustic_strummed") + ) + assert "| pass |" in strum_row + + # clean_acoustic_single_line: mean=0.665 < 0.85 → fail + single_row = next( + line for line in md.split("\n") if line.startswith("| clean_acoustic_single_line") + ) + assert "| fail |" in single_row + + # clean_electric: tier not in report → missing + electric_row = next(line for line in md.split("\n") if line.startswith("| clean_electric")) + assert "| missing |" in electric_row + + +def test_baseline_markdown_methodology_includes_settings(tmp_path: Path) -> None: + md = format_baseline_markdown( + _report(tmp_path), + backend_label="highres", + position_prior_label="guitarset-v1", + eval_harness_sha="deadbeef", + ) + assert "`highres`" in md + assert "`guitarset-v1`" in md + assert "`deadbeef`" in md + assert "Bootstrap: N=10,000" in md + assert "Onset tolerance: 50 ms" in md + + +def test_decomposition_markdown_has_aggregate_and_per_tier(tmp_path: Path) -> None: + md = format_decomposition_markdown(_report(tmp_path)) + + assert "## Aggregate (all tiers)" in md + assert "## Per-tier breakdown" in md + # Bucket names should appear in the aggregate table + for bucket in ( + "correct", + "wrong_position_same_pitch", + "pitch_off", + "timing_only", + "missed_onset", + "extra_detection", + ): + assert bucket in md + + +def test_decomposition_markdown_aggregates_per_clip(tmp_path: Path) -> None: + """Aggregate row should sum per-clip decompositions, not duplicate per-tier.""" + md = format_decomposition_markdown(_report(tmp_path)) + # 4 clips × 10 correct each = 40 + aggregate_section = md.split("## Per-tier breakdown")[0] + assert "| correct | 40 |" in aggregate_section + + +@pytest.mark.parametrize( + "tier", + list(DEFAULT_TIER_TARGETS), +) +def test_default_targets_cover_all_required_tiers(tier: str) -> None: + assert tier in DEFAULT_TIER_TARGETS + assert 0.0 < DEFAULT_TIER_TARGETS[tier] <= 1.0 diff --git a/tabvision/tests/unit/test_error_decomposition.py b/tabvision/tests/unit/test_error_decomposition.py new file mode 100644 index 0000000..3db377e --- /dev/null +++ b/tabvision/tests/unit/test_error_decomposition.py @@ -0,0 +1,257 @@ +"""Tests for the Tab F1 error-decomposition module (Phase 0).""" + +from __future__ import annotations + +import pytest + +from tabvision.eval.error_decomposition import ( + ErrorDecomposition, + aggregate_decompositions, + decompose_errors, +) +from tabvision.types import TabEvent + + +def _ev(onset: float, string_idx: int, fret: int, *, pitch: int | None = None) -> TabEvent: + """Convenience: TabEvent with default duration, confidence, and derived pitch.""" + # Standard tuning open pitches: low E to high E. + open_pitches = (40, 45, 50, 55, 59, 64) + pitch_midi = pitch if pitch is not None else open_pitches[string_idx] + fret + return TabEvent( + onset_s=onset, + duration_s=0.1, + string_idx=string_idx, + fret=fret, + pitch_midi=pitch_midi, + confidence=1.0, + ) + + +def test_perfect_match_all_correct() -> None: + gold = [_ev(0.0, 0, 0), _ev(0.5, 2, 5), _ev(1.0, 4, 3)] + pred = list(gold) + + r = decompose_errors(pred, gold) + + assert r.correct == 3 + assert r.total_loss == 0 + assert r.wrong_position_same_pitch == 0 + assert r.missed_onset == 0 + assert r.extra_detection == 0 + + +def test_wrong_position_same_pitch_bucket() -> None: + """E3 (MIDI 64) on high-E open vs MIDI 64 on G string fret 9: same pitch, different position.""" + gold = [_ev(0.0, 5, 0, pitch=64)] # high E open, MIDI 64 + pred = [_ev(0.0, 2, 9, pitch=64)] # MIDI 64 placed at G string fret 9 — same pitch + + r = decompose_errors(pred, gold) + + assert r.correct == 0 + assert r.wrong_position_same_pitch == 1 + assert r.pitch_off == 0 + + +def test_pitch_off_bucket() -> None: + """Onset matches strictly but the predicted pitch is wrong.""" + gold = [_ev(0.0, 0, 0, pitch=40)] + pred = [_ev(0.01, 0, 1, pitch=41)] # onset within tolerance, but wrong pitch + + r = decompose_errors(pred, gold) + + assert r.pitch_off == 1 + assert r.correct == 0 + assert r.wrong_position_same_pitch == 0 + + +def test_timing_only_bucket() -> None: + """Correct position + pitch, but onset just outside strict tolerance, within extended.""" + gold = [_ev(0.0, 0, 0)] + pred = [_ev(0.10, 0, 0)] # 100 ms off — outside strict (50 ms), within extended (150 ms) + + r = decompose_errors(pred, gold) + + assert r.timing_only == 1 + assert r.correct == 0 + assert r.missed_onset == 0 + + +def test_missed_onset_bucket() -> None: + """Gold event with no predicted event nearby at all.""" + gold = [_ev(0.0, 0, 0)] + pred: list[TabEvent] = [] + + r = decompose_errors(pred, gold) + + assert r.missed_onset == 1 + assert r.extra_detection == 0 + + +def test_extra_detection_bucket() -> None: + """Predicted event with no gold event nearby at all.""" + gold: list[TabEvent] = [] + pred = [_ev(0.0, 0, 0)] + + r = decompose_errors(pred, gold) + + assert r.extra_detection == 1 + assert r.missed_onset == 0 + + +def test_predicted_far_from_gold_yields_missed_and_extra() -> None: + """Far-apart events should bucket as missed + extra, not pair up.""" + gold = [_ev(0.0, 0, 0)] + pred = [_ev(10.0, 0, 0)] + + r = decompose_errors(pred, gold) + + assert r.missed_onset == 1 + assert r.extra_detection == 1 + assert r.correct == 0 + + +def test_mixed_buckets() -> None: + """A mixed scenario across all buckets at once.""" + gold = [ + _ev(0.0, 0, 0), # correct match + _ev(0.5, 5, 0, pitch=64), # wrong-position match (MIDI 64 placed elsewhere) + _ev(1.0, 2, 5, pitch=55), # pitch_off (pred at wrong position with wrong pitch) + _ev(1.5, 3, 7), # timing_only (pred is 100 ms late) + _ev(2.0, 4, 3), # missed_onset + ] + pred = [ + _ev(0.01, 0, 0), # → correct + _ev(0.51, 2, 9, pitch=64), # → wrong_position_same_pitch + _ev(1.01, 0, 3), # → pitch_off (low E fret 3 → MIDI 43, ≠ gold's 55) + _ev(1.60, 3, 7), # → timing_only (100 ms late) + # Nothing near gold[4] at 2.0 → missed_onset + _ev(5.0, 0, 0), # → extra_detection (far from any gold) + ] + + r = decompose_errors(pred, gold) + + assert r.correct == 1 + assert r.wrong_position_same_pitch == 1 + assert r.pitch_off == 1 + assert r.timing_only == 1 + assert r.missed_onset == 1 + assert r.extra_detection == 1 + + +def test_share_of_loss_sums_to_one() -> None: + r = ErrorDecomposition( + correct=10, + wrong_position_same_pitch=3, + pitch_off=2, + timing_only=1, + missed_onset=2, + extra_detection=2, + ) + shares = r.share_of_loss() + assert sum(shares.values()) == pytest.approx(1.0) + assert shares["wrong_position_same_pitch"] == pytest.approx(3 / 10) + + +def test_share_of_loss_zero_when_no_loss() -> None: + r = ErrorDecomposition(correct=5) + shares = r.share_of_loss() + assert all(v == 0.0 for v in shares.values()) + + +def test_total_gold_excludes_extra_detection() -> None: + r = ErrorDecomposition( + correct=10, wrong_position_same_pitch=2, pitch_off=1, missed_onset=3, extra_detection=5 + ) + # total_gold = correct + wrong_pos + pitch_off + timing_only + missed_onset + assert r.total_gold == 16 + # total_predicted = correct + wrong_pos + pitch_off + timing_only + extra_detection + assert r.total_predicted == 18 + + +def test_aggregate_decompositions_sums_bucketwise() -> None: + a = ErrorDecomposition(correct=5, wrong_position_same_pitch=2) + b = ErrorDecomposition(correct=10, missed_onset=3, extra_detection=1) + agg = aggregate_decompositions([a, b]) + assert agg.correct == 15 + assert agg.wrong_position_same_pitch == 2 + assert agg.missed_onset == 3 + assert agg.extra_detection == 1 + assert agg.pitch_off == 0 + + +def test_aggregate_empty_returns_zeros() -> None: + agg = aggregate_decompositions([]) + assert agg == ErrorDecomposition() + assert agg.total_loss == 0 + + +def test_rejects_invalid_tolerances() -> None: + with pytest.raises(ValueError, match="onset_tolerance_s"): + decompose_errors([], [], onset_tolerance_s=0.0) + with pytest.raises(ValueError, match=">="): + decompose_errors([], [], onset_tolerance_s=0.1, timing_extended_tolerance_s=0.05) + + +def test_each_pred_matches_at_most_one_gold() -> None: + """Two gold events at the same time should not both claim one pred.""" + gold = [_ev(0.0, 0, 0), _ev(0.0, 0, 0)] + pred = [_ev(0.0, 0, 0)] + + r = decompose_errors(pred, gold) + + assert r.correct == 1 + assert r.missed_onset == 1 + assert r.extra_detection == 0 + + +def test_greedy_picks_closest_onset() -> None: + """When multiple same-position preds are within tolerance, the closest-by-onset wins.""" + gold = [_ev(0.0, 0, 0)] + pred = [_ev(0.04, 0, 0), _ev(0.01, 0, 0)] # both within 50 ms; 0.01 is closer + + r = decompose_errors(pred, gold) + + assert r.correct == 1 + assert r.extra_detection == 1 + + +def test_chord_cluster_priority_pitch_over_onset() -> None: + """Multi-gold same-onset chord: matcher should pair by pitch, not by onset proximity. + + Two gold events at the same onset with different pitches, paired + with two preds whose pitches match the gold (but whose on-the-wire + ordering doesn't). Onset-only greediness would mis-pair them and + inflate ``pitch_off``. The priority-based matcher must pair on + pitch. + """ + gold = [ + _ev(0.0, 0, 0, pitch=40), # low E + _ev(0.0, 1, 2, pitch=47), # A string fret 2 + ] + pred = [ + # Different on-the-wire order: pitch=47 first. + _ev(0.01, 1, 2, pitch=47), # → matches gold[1] (correct) + _ev(0.01, 0, 0, pitch=40), # → matches gold[0] (correct) + ] + + r = decompose_errors(pred, gold) + + assert r.correct == 2 + assert r.pitch_off == 0 + assert r.wrong_position_same_pitch == 0 + + +def test_chord_cluster_priority_falls_back_to_position_match_then_pitch() -> None: + """When one pred has the right position and another has the right pitch, + the same-position match wins for ``correct`` accounting. + """ + gold = [_ev(0.0, 0, 0, pitch=40)] + pred = [ + # Same pitch as gold but different position + _ev(0.005, 5, 0, pitch=64), # noise; nothing in common + _ev(0.020, 0, 0, pitch=40), # exact match; further in onset + ] + + r = decompose_errors(pred, gold) + + assert r.correct == 1 # picked the same-position match even though it's further diff --git a/tabvision/tests/unit/test_eval_manifest.py b/tabvision/tests/unit/test_eval_manifest.py index 7810ce1..bad81d4 100644 --- a/tabvision/tests/unit/test_eval_manifest.py +++ b/tabvision/tests/unit/test_eval_manifest.py @@ -55,7 +55,8 @@ def test_manifest_validation_is_json_serializable_and_sorted(tmp_path: Path) -> source = "EGDB" split = "test" media_path = "$TABVISION_DATA_ROOT/egdb/b.wav" -annotation_path = "$TABVISION_DATA_ROOT/egdb/b.jams" +annotation_path = "$TABVISION_DATA_ROOT/egdb/b.gp5" +annotation_format = "egdb_gp" [[clips]] id = "a" @@ -64,6 +65,7 @@ def test_manifest_validation_is_json_serializable_and_sorted(tmp_path: Path) -> split = "validation" media_path = "$TABVISION_DATA_ROOT/guitarset/a.wav" annotation_path = "$TABVISION_DATA_ROOT/guitarset/a.jams" +annotation_format = "guitarset_jams" """.strip() + "\n", encoding="utf-8", @@ -78,3 +80,112 @@ def test_manifest_validation_is_json_serializable_and_sorted(tmp_path: Path) -> assert payload["present_tiers"] == ["clean_acoustic_strummed", "distorted_electric"] assert payload["passed"] is True assert tomllib.loads(manifest.read_text(encoding="utf-8"))["clips"][0]["id"] == "b" + + +def test_annotation_format_is_required(tmp_path: Path) -> None: + """Phase 0: every clip must declare its parser dispatch key.""" + manifest = tmp_path / "manifest.toml" + manifest.write_text( + """ +[[clips]] +id = "missing-format" +tier = "clean_acoustic_strummed" +source = "GuitarSet" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/guitarset/a.wav" +annotation_path = "$TABVISION_DATA_ROOT/guitarset/a.jams" +""".strip() + + "\n", + encoding="utf-8", + ) + + result = validate_manifest(manifest) + + assert not result.passed + assert any( + item.code == "MISSING_ANNOTATION_FORMAT" and item.severity == "fail" + for item in result.items + ) + + +def test_synthetic_source_blocked_in_test_split(tmp_path: Path) -> None: + """Cross-contamination guard: synthetic-source clip in test split is rejected.""" + manifest = tmp_path / "manifest.toml" + manifest.write_text( + """ +[[clips]] +id = "synth-in-test" +tier = "clean_electric" +source = "synthtab/electric" +split = "test" +media_path = "$TABVISION_DATA_ROOT/synthtab/x.wav" +annotation_path = "$TABVISION_DATA_ROOT/synthtab/x.json" +annotation_format = "synthtab_json" +""".strip() + + "\n", + encoding="utf-8", + ) + + result = validate_manifest(manifest) + + assert not result.passed + failures = [ + item + for item in result.items + if item.code == "SYNTHETIC_IN_EVAL_SPLIT" and item.severity == "fail" + ] + assert len(failures) == 1 + assert failures[0].clip_id == "synth-in-test" + + +def test_synthetic_source_blocked_in_validation_split(tmp_path: Path) -> None: + manifest = tmp_path / "manifest.toml" + manifest.write_text( + """ +[[clips]] +id = "synth-in-validation" +tier = "clean_electric" +source = "DadaGP/render-001" +split = "validation" +media_path = "$TABVISION_DATA_ROOT/dadagp/x.wav" +annotation_path = "$TABVISION_DATA_ROOT/dadagp/x.json" +annotation_format = "dadagp_json" +""".strip() + + "\n", + encoding="utf-8", + ) + + result = validate_manifest(manifest) + + failures = [ + item + for item in result.items + if item.code == "SYNTHETIC_IN_EVAL_SPLIT" and item.severity == "fail" + ] + assert len(failures) == 1 + assert failures[0].clip_id == "synth-in-validation" + + +def test_synthetic_source_allowed_in_train_split(tmp_path: Path) -> None: + """Synthetic data is permitted as training material (per design plan §4.2).""" + manifest = tmp_path / "manifest.toml" + manifest.write_text( + """ +[[clips]] +id = "synth-in-train" +tier = "clean_electric" +source = "synthtab/electric" +split = "train" +media_path = "$TABVISION_DATA_ROOT/synthtab/x.wav" +annotation_path = "$TABVISION_DATA_ROOT/synthtab/x.json" +annotation_format = "synthtab_json" +""".strip() + + "\n", + encoding="utf-8", + ) + + result = validate_manifest(manifest) + + assert not any( + item.code == "SYNTHETIC_IN_EVAL_SPLIT" for item in result.items + ) diff --git a/tabvision/tests/unit/test_manifest_builder.py b/tabvision/tests/unit/test_manifest_builder.py new file mode 100644 index 0000000..5f011f7 --- /dev/null +++ b/tabvision/tests/unit/test_manifest_builder.py @@ -0,0 +1,397 @@ +"""Tests for the composite-eval manifest builder (Phase 0).""" + +from __future__ import annotations + +import json +import tomllib +from pathlib import Path + +import pytest + +from tabvision.eval.manifest import validate_manifest +from tabvision.eval.manifest_builder import ( + ClipEntry, + apply_limits, + build_manifest, + render_toml, + scan_guitar_techs, + scan_guitarset, + summarise_coverage, +) + + +def _make_guitarset_layout( + root: Path, + tracks: list[tuple[str, dict | None]], +) -> None: + """Build a fake GuitarSet directory at ``root``. + + Each ``tracks`` tuple is ``(track_id, jams_payload)``. Pass payload + ``None`` to write the JAMS but omit the audio file (simulates a + half-present clip that the scanner should skip). The audio file is + a zero-byte placeholder when payload is not ``None``. + """ + annotation_dir = root / "annotation" + audio_dir = root / "audio_mono-mic" + annotation_dir.mkdir(parents=True, exist_ok=True) + audio_dir.mkdir(parents=True, exist_ok=True) + for track_id, payload in tracks: + jams_path = annotation_dir / f"{track_id}.jams" + jams_path.write_text(json.dumps(payload or {"annotations": []}), encoding="utf-8") + if payload is not None: + (audio_dir / f"{track_id}_mic.wav").write_bytes(b"") + + +def test_scan_guitarset_classifies_comp_and_solo(tmp_path: Path) -> None: + _make_guitarset_layout( + tmp_path, + [ + ("05_Rock1-90-C#_comp", {"annotations": []}), + ("05_Funk1-114-Ab_solo", {"annotations": []}), + ], + ) + + entries = scan_guitarset(tmp_path) + + by_id = {entry.id: entry for entry in entries} + assert by_id["guitarset/05_Rock1-90-C#_comp"].tier == "clean_acoustic_strummed" + assert by_id["guitarset/05_Funk1-114-Ab_solo"].tier == "clean_acoustic_single_line" + for entry in entries: + assert entry.source == "GuitarSet" + assert entry.annotation_format == "guitarset_jams" + + +def test_scan_guitarset_assigns_validation_split_for_player_05(tmp_path: Path) -> None: + _make_guitarset_layout( + tmp_path, + [ + ("00_Rock1-90-C#_comp", {"annotations": []}), + ("05_Rock1-90-C#_comp", {"annotations": []}), + ], + ) + + entries = scan_guitarset(tmp_path) + + by_id = {entry.id: entry for entry in entries} + assert by_id["guitarset/00_Rock1-90-C#_comp"].split == "train" + assert by_id["guitarset/05_Rock1-90-C#_comp"].split == "validation" + + +def test_scan_guitarset_skips_when_audio_missing(tmp_path: Path) -> None: + """A JAMS without matching audio is skipped silently.""" + _make_guitarset_layout( + tmp_path, + [ + ("05_OnlyAnnot-90-A_comp", None), # JAMS present, no audio + ], + ) + assert scan_guitarset(tmp_path) == [] + + +def test_scan_guitarset_skips_unrecognised_suffix(tmp_path: Path) -> None: + """Tracks without _comp or _solo suffix are skipped.""" + _make_guitarset_layout( + tmp_path, + [ + ("05_OddTrackId-90-A_other", {"annotations": []}), + ], + ) + assert scan_guitarset(tmp_path) == [] + + +def test_scan_guitarset_returns_empty_for_missing_root(tmp_path: Path) -> None: + assert scan_guitarset(tmp_path / "nonexistent") == [] + + +def test_scan_guitarset_returns_empty_for_partial_layout(tmp_path: Path) -> None: + """Root with annotation/ but no audio_mono-mic/ returns empty.""" + (tmp_path / "annotation").mkdir() + assert scan_guitarset(tmp_path) == [] + + +def test_scan_guitar_techs_returns_empty_stub(tmp_path: Path) -> None: + """Guitar-TECHS scanner is a stub until the dataset is acquired.""" + assert scan_guitar_techs(tmp_path) == [] + + +def _entry(clip_id: str, tier: str = "clean_acoustic_strummed") -> ClipEntry: + return ClipEntry( + id=clip_id, + tier=tier, + source="GuitarSet", + split="validation", + media_path=f"/data/{clip_id}.wav", + annotation_path=f"/data/{clip_id}.jams", + annotation_format="guitarset_jams", + ) + + +def test_apply_limits_caps_per_tier_deterministically() -> None: + entries = [ + _entry("a", "clean_acoustic_strummed"), + _entry("b", "clean_acoustic_strummed"), + _entry("c", "clean_acoustic_strummed"), + _entry("d", "clean_acoustic_single_line"), + _entry("e", "clean_acoustic_single_line"), + ] + + capped = apply_limits(entries, max_clips_per_tier=2) + + # 2 per tier, sorted by id within each tier + ids = [entry.id for entry in capped] + assert ids == ["a", "b", "d", "e"] + + +def test_apply_limits_applies_total_after_per_tier() -> None: + entries = [ + _entry("a", "clean_acoustic_strummed"), + _entry("b", "clean_acoustic_strummed"), + _entry("c", "clean_acoustic_single_line"), + ] + + capped = apply_limits(entries, max_clips_per_tier=2, total_limit=2) + + assert [entry.id for entry in capped] == ["a", "b"] + + +def test_apply_limits_with_no_caps_preserves_all_sorted() -> None: + entries = [_entry("b"), _entry("a"), _entry("c")] + out = apply_limits(entries) + assert [entry.id for entry in out] == ["a", "b", "c"] + + +def test_render_toml_round_trips_via_tomllib() -> None: + entries = [ + _entry("a", "clean_acoustic_strummed"), + _entry("b", "clean_acoustic_single_line"), + ] + text = render_toml(entries) + parsed = tomllib.loads(text) + assert len(parsed["clips"]) == 2 + by_id = {clip["id"]: clip for clip in parsed["clips"]} + assert by_id["a"]["tier"] == "clean_acoustic_strummed" + assert by_id["a"]["annotation_format"] == "guitarset_jams" + + +def test_render_toml_is_byte_stable() -> None: + """Same entries → same bytes, regardless of input order.""" + entries_in_order_a = [_entry("z"), _entry("a"), _entry("m")] + entries_in_order_b = [_entry("a"), _entry("m"), _entry("z")] + assert render_toml(entries_in_order_a) == render_toml(entries_in_order_b) + + +def test_render_toml_emits_header_when_provided() -> None: + text = render_toml([_entry("a")], header_comment="hello world") + assert text.startswith("# hello world\n") + + +def test_render_toml_rewrites_paths_under_data_root(tmp_path: Path) -> None: + """media/annotation paths under data_root become $TABVISION_DATA_ROOT/.""" + data_root = tmp_path / "datasets" + data_root.mkdir() + entry = ClipEntry( + id="clip-x", + tier="clean_acoustic_strummed", + source="GuitarSet", + split="validation", + media_path=str((data_root / "guitarset" / "audio.wav").resolve()), + annotation_path=str((data_root / "guitarset" / "ann.jams").resolve()), + annotation_format="guitarset_jams", + ) + text = render_toml([entry], data_root=data_root) + assert '"$TABVISION_DATA_ROOT/guitarset/audio.wav"' in text + assert '"$TABVISION_DATA_ROOT/guitarset/ann.jams"' in text + # Paths NOT under data_root should be untouched. + assert "/datasets/" not in text # absolute prefix is gone + + +def test_render_toml_leaves_paths_outside_data_root_alone(tmp_path: Path) -> None: + data_root = tmp_path / "datasets" + data_root.mkdir() + other = tmp_path / "elsewhere" / "x.wav" + other.parent.mkdir(parents=True) + other.write_bytes(b"") + entry = ClipEntry( + id="clip-x", + tier="clean_acoustic_strummed", + source="GuitarSet", + split="validation", + media_path=str(other.resolve()), + annotation_path=str(other.resolve()), + annotation_format="guitarset_jams", + ) + text = render_toml([entry], data_root=data_root) + assert "$TABVISION_DATA_ROOT" not in text + assert str(other.resolve()) in text + + +def test_render_toml_with_no_data_root_is_unchanged(tmp_path: Path) -> None: + """Backward-compat: omitting data_root keeps current absolute-path output.""" + entry = ClipEntry( + id="clip-x", + tier="clean_acoustic_strummed", + source="GuitarSet", + split="validation", + media_path="/some/abs/path.wav", + annotation_path="/some/abs/path.jams", + annotation_format="guitarset_jams", + ) + text = render_toml([entry], data_root=None) + assert "/some/abs/path.wav" in text + assert "$TABVISION_DATA_ROOT" not in text + + +def test_summarise_coverage_reports_per_tier_and_per_split() -> None: + entries = [ + _entry("a", "clean_acoustic_strummed"), + _entry("b", "clean_acoustic_strummed"), + _entry("c", "clean_acoustic_single_line"), + ] + summary = summarise_coverage(entries) + assert "Total clips: 3" in summary + assert "clean_acoustic_strummed: 2 clips" in summary + assert "clean_acoustic_single_line: 1 clips" in summary + + +def test_build_manifest_skips_missing_roots(tmp_path: Path) -> None: + """Missing GuitarSet root → empty result, no exception.""" + entries = build_manifest(guitarset_root=tmp_path / "nope") + assert entries == [] + + +def test_build_manifest_splits_filter(tmp_path: Path) -> None: + """``splits=('validation',)`` should keep only player-05 clips.""" + _make_guitarset_layout( + tmp_path / "guitarset", + [ + ("00_Rock1-90-C#_comp", {"annotations": []}), # train + ("05_Funk1-114-Ab_solo", {"annotations": []}), # validation + ], + ) + + train_only = build_manifest( + guitarset_root=tmp_path / "guitarset", + splits=("train",), + ) + validation_only = build_manifest( + guitarset_root=tmp_path / "guitarset", + splits=("validation",), + ) + both = build_manifest(guitarset_root=tmp_path / "guitarset") + + assert {entry.id for entry in train_only} == {"guitarset/00_Rock1-90-C#_comp"} + assert {entry.id for entry in validation_only} == { + "guitarset/05_Funk1-114-Ab_solo" + } + assert len(both) == 2 + + +def test_build_manifest_emits_synthetic_train_clip_ok(tmp_path: Path) -> None: + """Training-split synthetic clips should pass the in-builder guard.""" + # Use a custom ClipEntry-yielding scanner via the public function + entries = [ + ClipEntry( + id="synthetic-train-01", + tier="distorted_electric", + source="synthtab/electric", + split="train", + media_path="/data/x.wav", + annotation_path="/data/x.json", + annotation_format="synthtab_json", + ), + ] + # The guard should be a no-op for train split; verify via apply_limits roundtrip. + out = apply_limits(entries, max_clips_per_tier=1) + assert len(out) == 1 + + +def test_main_writes_manifest_and_passes_validation( + tmp_path: Path, capsys: pytest.CaptureFixture[str] +) -> None: + """End-to-end: build_composite_manifest builds → manifest validates.""" + _make_guitarset_layout( + tmp_path / "guitarset", + [ + ( + "05_Rock1-90-C#_comp", + { + "annotations": [ + { + "namespace": "note_midi", + "annotation_metadata": {"data_source": "0"}, + "data": [ + {"time": 0.0, "duration": 0.5, "value": 40}, + ], + } + ] + }, + ), + ( + "05_Funk1-114-Ab_solo", + { + "annotations": [ + { + "namespace": "note_midi", + "annotation_metadata": {"data_source": "0"}, + "data": [ + {"time": 1.0, "duration": 0.5, "value": 45}, + ], + } + ] + }, + ), + ], + ) + output = tmp_path / "composite.toml" + + from tabvision.eval.manifest_builder import main + + rc = main( + [ + "--guitarset", + str(tmp_path / "guitarset"), + "--output", + str(output), + ] + ) + + assert rc == 0 + assert output.is_file() + captured = capsys.readouterr() + assert "Wrote 2 clips" in captured.out + assert "Manifest validation passed." in captured.out + + # The emitted manifest should itself validate cleanly. + validation = validate_manifest(output) + assert validation.passed + + +def test_main_requires_at_least_one_root(tmp_path: Path) -> None: + """Without --guitarset / --guitar-techs, the CLI exits with usage error.""" + from tabvision.eval.manifest_builder import main + + with pytest.raises(SystemExit) as excinfo: + main(["--output", str(tmp_path / "x.toml")]) + assert excinfo.value.code == 2 + + +def test_main_returns_1_when_no_clips_discovered( + tmp_path: Path, capsys: pytest.CaptureFixture[str] +) -> None: + """Specifying a path with no matching data → rc=1, no output file.""" + output = tmp_path / "composite.toml" + from tabvision.eval.manifest_builder import main + + rc = main( + [ + "--guitarset", + str(tmp_path / "empty"), + "--output", + str(output), + ] + ) + + assert rc == 1 + assert not output.exists() + captured = capsys.readouterr() + assert "No clips discovered" in captured.out diff --git a/tabvision/tests/unit/test_parser_guitar_techs_midi.py b/tabvision/tests/unit/test_parser_guitar_techs_midi.py new file mode 100644 index 0000000..34f109c --- /dev/null +++ b/tabvision/tests/unit/test_parser_guitar_techs_midi.py @@ -0,0 +1,161 @@ +"""Tests for the Guitar-TECHS MIDI parser (Phase 0).""" + +from __future__ import annotations + +from pathlib import Path + +import pytest + +pretty_midi = pytest.importorskip("pretty_midi") + +from tabvision.eval.parsers import get_parser # noqa: E402 +from tabvision.eval.parsers.guitar_techs_midi import ( # noqa: E402 + DEFAULT_TRACK_TO_STRING, + parse, +) +from tabvision.types import GuitarConfig # noqa: E402 + + +def _make_midi(tmp_path: Path, *tracks_of_notes: list[tuple[int, float, float]]) -> Path: + """Build a multi-track MIDI fixture. + + Each positional arg is a list of ``(pitch, start, end)`` tuples for + one track. Pass an empty list to create an empty track. + """ + midi = pretty_midi.PrettyMIDI() + for notes in tracks_of_notes: + instrument = pretty_midi.Instrument(program=24) # acoustic guitar + for pitch, start, end in notes: + instrument.notes.append( + pretty_midi.Note(velocity=80, pitch=pitch, start=start, end=end) + ) + midi.instruments.append(instrument) + midi_path = tmp_path / "clip.mid" + midi.write(str(midi_path)) + return midi_path + + +def test_track_zero_maps_to_low_e_string(tmp_path: Path) -> None: + """Track 0 should carry low-E notes (string_idx 0, MIDI 40 → fret 0).""" + midi_path = _make_midi( + tmp_path, + [(40, 0.0, 0.5)], + [], + [], + [], + [], + [], + ) + + events = parse(midi_path) + + assert len(events) == 1 + assert events[0].string_idx == 0 + assert events[0].fret == 0 + assert events[0].pitch_midi == 40 + + +def test_per_string_pitch_to_fret_derivation(tmp_path: Path) -> None: + """Pitch minus open-string MIDI gives the fret for each string.""" + # Standard tuning MIDI: (40, 45, 50, 55, 59, 64) — low E .. high E. + midi_path = _make_midi( + tmp_path, + [(40, 0.00, 0.10)], # track 0 (E2) → fret 0 + [(50, 0.10, 0.20)], # track 1 (A2 + 5 semitones) → fret 5 + [(55, 0.20, 0.30)], # track 2 (D3 + 5 semitones) → fret 5 + [(62, 0.30, 0.40)], # track 3 (G3 + 7 semitones) → fret 7 + [(64, 0.40, 0.50)], # track 4 (B3 + 5 semitones) → fret 5 + [(76, 0.50, 0.60)], # track 5 (high E + 12) → fret 12 + ) + + events = parse(midi_path) + + by_string = {ev.string_idx: ev.fret for ev in events} + assert by_string == {0: 0, 1: 5, 2: 5, 3: 7, 4: 5, 5: 12} + + +def test_drops_notes_outside_fret_range(tmp_path: Path) -> None: + """Notes that imply fret < 0 or > max_fret are skipped silently.""" + # MIDI 35 < open low-E (40) → fret -5, drop. + # MIDI 90 > 40+24 → fret 50, drop. + midi_path = _make_midi( + tmp_path, + [(35, 0.0, 0.1), (90, 0.5, 0.6)], + [], [], [], [], [], + ) + + assert parse(midi_path) == [] + + +def test_events_sorted_by_onset(tmp_path: Path) -> None: + """Output is sorted by ``(onset_s, string_idx, fret)`` regardless of input order.""" + midi_path = _make_midi( + tmp_path, + [(40, 2.00, 2.10), (40, 0.00, 0.10)], + [], [], [], [], [], + ) + + events = parse(midi_path) + assert [ev.onset_s for ev in events] == [0.0, 2.0] + + +def test_capo_filters_below_capo_fret(tmp_path: Path) -> None: + """``cfg.capo`` raises the lower-bound for accepted frets.""" + midi_path = _make_midi( + tmp_path, + [(40, 0.0, 0.1), (42, 0.1, 0.2)], + [], [], [], [], [], + ) + + cfg = GuitarConfig(capo=3) + events = parse(midi_path, cfg) + # MIDI 40 → fret 0 < capo 3, dropped. MIDI 42 → fret 2 < 3, dropped. + assert events == [] + + +def test_extra_tracks_beyond_six_are_ignored(tmp_path: Path) -> None: + """If a MIDI has > 6 tracks, only the first 6 are read.""" + midi_path = _make_midi( + tmp_path, + [(40, 0.0, 0.1)], + [], [], [], [], [], + [(40, 0.0, 0.1)], # 7th track — outside the mapping + ) + + events = parse(midi_path) + assert len(events) == 1 + assert events[0].string_idx == 0 + + +def test_custom_track_to_string_mapping(tmp_path: Path) -> None: + """A reversed mapping should put track 0's notes on high E.""" + midi_path = _make_midi( + tmp_path, + [(64, 0.0, 0.1)], + [], [], [], [], [], + ) + + reversed_map: tuple[int, ...] = (5, 4, 3, 2, 1, 0) + events = parse(midi_path, track_to_string=reversed_map) + + assert len(events) == 1 + assert events[0].string_idx == 5 + assert events[0].fret == 0 + + +def test_default_mapping_is_identity() -> None: + assert DEFAULT_TRACK_TO_STRING == (0, 1, 2, 3, 4, 5) + + +def test_dispatch_via_registry(tmp_path: Path) -> None: + """End-to-end: parser is reachable via the composite-eval dispatch path.""" + midi_path = _make_midi( + tmp_path, + [(40, 0.0, 0.1)], + [], [], [], [], [], + ) + parser = get_parser("guitar_techs_midi") + assert parser is parse + + events = parser(midi_path, None) + assert len(events) == 1 diff --git a/tabvision/tests/unit/test_parsers_registry.py b/tabvision/tests/unit/test_parsers_registry.py new file mode 100644 index 0000000..a661f91 --- /dev/null +++ b/tabvision/tests/unit/test_parsers_registry.py @@ -0,0 +1,85 @@ +"""Tests for the annotation-parser registry (Phase 0).""" + +from __future__ import annotations + +import json +from pathlib import Path + +import pytest + +from tabvision.eval.parsers import ( + clear_parsers, + get_parser, + list_parsers, + register_parser, +) +from tabvision.eval.parsers.registry import _PARSERS as _GLOBAL_PARSERS + + +@pytest.fixture +def isolated_registry(): + """Save + restore the registry around tests that mutate it.""" + saved = dict(_GLOBAL_PARSERS) + yield + clear_parsers() + _GLOBAL_PARSERS.update(saved) + + +def test_builtin_parsers_registered_on_import(): + """The package import should auto-register at least GuitarSet JAMS.""" + parsers = list_parsers() + assert "guitarset_jams" in parsers + + +def test_get_parser_returns_callable(): + parser = get_parser("guitarset_jams") + assert callable(parser) + + +def test_get_parser_raises_keyerror_with_known_formats_listed(): + with pytest.raises(KeyError) as excinfo: + get_parser("nonexistent_format") + assert "guitarset_jams" in str(excinfo.value) + + +def test_register_parser_rejects_duplicate(isolated_registry): + def fake_parser(path, cfg=None): + return [] + + with pytest.raises(ValueError, match="already registered"): + register_parser("guitarset_jams", fake_parser) + + +def test_register_then_get_roundtrip(isolated_registry): + def fake_parser(path, cfg=None): + return [] + + register_parser("fake_format", fake_parser) + assert get_parser("fake_format") is fake_parser + assert "fake_format" in list_parsers() + + +def test_dispatch_via_registry_parses_jams(tmp_path: Path): + """End-to-end: composite-eval dispatch path runs through the registry.""" + payload = { + "annotations": [ + { + "namespace": "note_midi", + "annotation_metadata": {"data_source": "0"}, + "data": [ + {"time": 0.10, "duration": 0.25, "value": 42}, + ], + } + ] + } + jams_path = tmp_path / "clip.jams" + jams_path.write_text(json.dumps(payload), encoding="utf-8") + + parser = get_parser("guitarset_jams") + events = parser(jams_path, None) + + assert len(events) == 1 + assert events[0].string_idx == 0 + assert events[0].pitch_midi == 42 + # Low E = MIDI 40, so MIDI 42 on string 0 → fret 2. + assert events[0].fret == 2