Land v1 acoustic: composite eval, acoustic scope + honest targets, fusion continuity by pgil256 · Pull Request #13 · pgil256/tab_vision

pgil256 · 2026-06-03T13:06:47Z

Lands the full v1 acoustic program onto main (26 commits). Supersedes #11 — Phase 0 is a strict subset of this branch; #11 will be closed once this merges.

What this lands

Phase 0 composite eval — multi-source per-tier harness, parsers (GuitarSet JAMS / Guitar-TECHS MIDI), bootstrap CIs, six-bucket error decomposition.
v1 scope = acoustic (SPEC §1.4.1, 2026-06-02) — honest audio-only targets (single-line ≥ 0.45, strummed ≥ 0.60, aggregate ≥ 0.55). Single-line is information-limited from audio (string/fret ambiguity); 0.94 single-line moves to v1.1 (video string-resolution).
Electric → v2 — evidence-based: clean-electric Tab F1 measured 0.12 on an acoustic-trained backbone with no in-repo training code. Ships the tone toggle (routes electric to a separate v2 checkpoint), the v2 fine-tune design doc, and resumable EGDB/Guitar-TECHS acquirers.
Fusion continuity win + SPEC sync.
Windows path fix (this session): _relativize_to_data_root uses Path.relative_to / as_posix instead of a hard-coded / prefix, so checked-in manifests no longer leak C:\... paths. Adds a PureWindowsPath regression test.
Format hygiene: ruff format pass over 12 pre-existing unformatted Phase 0 files — the only thing red on Phase 0: per-tier composite eval + first GuitarSet baseline #11 CI.

Verification (local)

ruff check clean, ruff format --check clean, mypy tabvision clean (56 files), eval/unit tests pass.
The formal all-metrics acceptance run (§1.4.1, GuitarSet held-out player 05) is executing separately; results land in docs/EVAL_REPORTS/ + docs/DECISIONS.md.

🤖 Generated with Claude Code

First Phase 0 chunk per docs/plans/2026-05-13-tab-f1-phase-0-implementation.md §1.1. Foundations for the composite-eval workflow; no production behavior changes. - tabvision.eval.parsers.registry: ParserFn protocol + register_parser / get_parser / list_parsers. Each source-specific annotation format gets a parser that registers itself at import time; composite-eval dispatches by Manifest.clip.annotation_format. - tabvision.eval.parsers.guitarset_jams: thin wrapper exposing the existing tabvision.eval.guitarset_audio.parse_guitarset_jams under the new uniform interface. No logic duplication. - tabvision.eval.bootstrap: bootstrap_ci() returning a BootstrapResult (statistic, lower, upper, n_observations, n_bootstrap, confidence). Implements the per-tier acceptance gate from the strategy doc §5 (lower_95_CI >= target, not just mean >= target). - 21 unit tests, all passing. Existing test_guitarset_audio_eval.py unchanged and still green. Ruff + mypy clean on the new files.

…tar-techs parser Phase 0 items 1-2 per docs/plans/2026-05-13-tab-f1-phase-0-implementation.md. Manifest (tabvision/tabvision/eval/manifest.py): - Add 'annotation_format' to REQUIRED_CLIP_FIELDS so composite-eval can route each clip to the correct parser via the registry. - Add SYNTHETIC_SOURCE_PREFIXES + cross-contamination guard: clips whose source starts with 'synthtab/', 'dadagp/', or 'synthetic/' are rejected in 'validation' and 'test' splits. Permitted in 'train'. Implements R8 from the strategy doc §7. Guitar-TECHS parser (tabvision/tabvision/eval/parsers/guitar_techs_midi.py): - Parses 6-track MIDI (one track per string, low E first) into list[TabEvent] via pretty_midi. Per-string fret derived from MIDI pitch minus open-string pitch. Drops out-of-range frets. - Optional 'track_to_string' kwarg for releases with a different ordering. Default = identity (low E = 0, high E = 5). - 9 unit tests using pretty_midi-built fixtures; importorskip when pretty_midi not installed. Updated manifest placeholder TOML schema with annotation_format and synthetic-source guard documentation. 4 new manifest validator tests. All 15 new tests pass; existing test_eval_manifest.py / test_parsers_registry.py still green. Ruff + mypy clean.

Phase 0 item 3 per docs/plans/2026-05-13-tab-f1-phase-0-implementation.md. Six-bucket decomposition matching the apr-28 methodology in tabvision-server/tools/outputs/errors-2026-04-28_185743.md, ported to operate on v1 §8 TabEvent lists: - correct: string + fret + onset all match within tolerance - wrong_position_same_pitch: pitch matches, position doesn't - pitch_off: onset matches but pitch and position differ - timing_only: pos or pitch matches outside strict tolerance but within extended tolerance - missed_onset: gold event with no nearby predicted event - extra_detection: predicted event unmatched by either pass (The seventh apr-28 bucket, muted_undetectable, needs a muted/X flag the v1 TabEvent contract does not yet carry; deferred.) Two-pass greedy matcher prioritizes (a) strict-tolerance closest onset, then (b) extended-tolerance pos-or-pitch match for timing_only. share_of_loss() returns per-bucket percentages of recoverable loss. aggregate_decompositions() sums per-track decompositions for the per-tier rollup that composite.py will produce. 16 unit tests covering each bucket in isolation, the mixed scenario, share-of-loss math, aggregation, and edge cases (multiple gold at same time, greedy onset-closest selection, invalid tolerances). Ruff + mypy clean.

Phase 0 item 4 per docs/plans/2026-05-13-tab-f1-phase-0-implementation.md. tabvision.eval.composite.run_composite_eval: - Reads + validates a multi-source manifest, dispatches each clip through the registered parser, runs a user-supplied predictor over the media, and computes onset / pitch / tab F1 + 95% bootstrap CIs per tier plus the 6-bucket error decomposition. - Predictor is injected so the harness is testable without the heavy audio backend; CLI wires up tabvision.pipeline.run_pipeline. - Train-split clips skipped by default (DEFAULT_EVAL_SPLITS = validation + test). - CompositeReport.tab_f1_acceptance(targets) classifies each tier as pass / gap / fail / missing based on the lower_95_CI >= target gate from strategy doc §5. tabvision.eval.metrics: added public event_f1() + EventF1Result for onset-only and onset+pitch matching. The private _score_event_f1 in guitarset_audio is left untouched (Phase 0 ground rule: no production behavior changes). 11 integration smoke tests covering perfect predictor (all tiers pass), shifted predictor (wrong_position_same_pitch dominates), train-split skipping, manifest validation failures, parser-format lookup failures, TABVISION_DATA_ROOT substitution via env + function arg, empty gold edge case, and the acceptance helper. Ruff + mypy clean.

Phase 0 item 5 per docs/plans/2026-05-13-tab-f1-phase-0-implementation.md. tabvision.eval.composite: - DEFAULT_TIER_TARGETS = {0.85/0.90/0.87/0.80} from SPEC §1.4.1. - format_baseline_markdown(report, targets, ...) renders the per-tier baseline table with pass/gap/fail/missing status, per-source breakdown, and methodology footer per Phase 0 impl plan §4.1. - format_decomposition_markdown(report) renders the aggregate + per-tier 7-bucket (currently 6) error breakdown per §4.2. - make_run_pipeline_predictor(...) wraps tabvision.pipeline.run_pipeline with lazy import — composite-eval --help works without the audio-highres extras installed. - main() — argparse CLI exposed as 'tabvision-composite-eval'. Supports --backend, --position-prior (or 'none'), --melodic-prior, --enable-video, --bootstrap-{n,seed}, --onset-tolerance-s, --splits, --media-root, --annotation-root, --eval-harness-sha. Single run can emit both the baseline and decomposition reports via --decomposition-output, so the separate decompose_tab_errors.py script listed in the Phase 0 plan is consolidated into this one CLI. tabvision/scripts/eval/composite_eval.py: 5-line shim that invokes the module's main(). 7 unit tests on the formatters: required sections, pass/gap/fail/missing classification, methodology fields, decomposition aggregate sums, default-target coverage. All 20 composite tests + 73 Phase 0 eval tests pass. Ruff + mypy clean.

Phase 0 item 6a per docs/plans/2026-05-13-tab-f1-phase-0-implementation.md. tabvision.eval.manifest_builder: - scan_guitarset(root, validation_player) — discovers <root>/annotation/*.jams paired with <root>/audio_mono-mic/*_mic.wav; maps _comp/_solo suffix to clean_acoustic_strummed/single_line tier. - scan_guitar_techs(root) — stub returning [] until the dataset is acquired and its on-disk layout is verified. - apply_limits(entries, max_clips_per_tier, total_limit) — deterministic per-tier cap + total cap, sorted by clip id first so re-runs produce byte-stable output. - build_manifest(splits=...) — full pipeline; supports filtering by split so smoke runs target the validation set directly. - render_toml(entries, header_comment) — TOML output with proper escaping and a generated-by header. - _refuse_synthetic_in_eval_splits — pre-write guard mirroring the validator's R8 cross-contamination check. - main() CLI: --guitarset, --guitar-techs, --output, --splits, --max-clips-per-tier, --limit. Returns rc=1 on no clips, rc=2 on validation failure, rc=0 on success. tabvision/scripts/eval/build_composite_manifest.py — thin CLI shim. Hygiene pass per PR feedback: - manifest.toml schema comment now lists guitar_techs_midi alongside guitarset_jams under 'known formats'. - Error-decomposition framing in composite.py and error_decomposition.py now uses 'six-bucket port of the apr-28 7-bucket harness' instead of '7-bucket' (we only populate 6 — muted_undetectable is deferred). - composite.py and manifest_builder.py both gain if __name__ == '__main__' blocks so 'python -m tabvision.eval.composite' and 'python -m tabvision.eval.manifest_builder' invoke main() cleanly. 20 manifest-builder tests pass (scan, limits, render, summarise, build_manifest, --splits filter, end-to-end CLI). Full Phase 0 test suite still green. Ruff + mypy clean. Smoke-validated against on-disk GuitarSet: --max-clips-per-tier 2 --splits validation produces a 4-clip manifest that the composite eval CLI processes end-to-end via the real highres backend + guitarset-v1 prior, emitting baseline + decomposition reports with sensible numbers (strummed Tab F1 ~0.75, single-line ~0.29 on this tiny sample).

Closes the Phase 0 acceptance gate for the 2 tiers reachable from on-disk data (clean acoustic single-line + strummed via GuitarSet held-out validation). Clean electric and distorted electric remain 'missing' pending Guitar-TECHS / EGDB acquisition. Matcher fix (tabvision/tabvision/eval/error_decomposition.py): - decompose_errors() now uses priority-based selection within each onset tolerance window: same (string, fret) > same pitch_midi > onset-closest. Previously a greedy onset-only matcher mis-paired chord-cluster events whose on-the-wire ordering differed from ground truth, inflating pitch_off on strummed (3387 → 486 with the fix). event_f1's pitch-matching semantics are now mirrored in the decomposition. - Added test_chord_cluster_priority_pitch_over_onset and test_chord_cluster_priority_falls_back_to_position_match_then_pitch to lock the new behavior. Reports (docs/EVAL_REPORTS/*): - composite_baseline_2026-05-13.md — first artifact under SPEC §1.4.1: per-tier Tab F1 + Onset/Pitch F1 + 95% bootstrap CI + pass/gap/fail/missing status. Headline: both covered tiers FAIL by ~25-35 pp (single-line mean 0.5076, strummed 0.6708). - tab_f1_error_decomposition_2026-05-13.md — companion 6-bucket breakdown. Headline: wrong_position_same_pitch dominates loss on every tier — 77% of single-line, 50% of strummed, 57% aggregate. Confirms the strategy doc §2 diagnostic. Eval manifest (tabvision/data/eval/composite.toml): - 60 player-05 validation clips, byte-stable output of the manifest builder. Strummed and single-line tiers fully covered. LICENSES.md: - GuitarSet: marked '✅ used for 2026-05-13 baseline'. - Guitar-TECHS: added as planned acquisition (CC-BY-4.0). - EGDB: status updated; author email pending. - GOAT: marked ❌ DROPPED (request-only research-only). - SynthTab: marked ❌ DROPPED from default pipeline (CC-BY-NC-4.0). - User clips: marked ⛔ banned per D10. - DadaGP: marked research/dev only; not in default pipeline. DECISIONS.md: single 2026-05-13 entry summarising D1-D11 from the design plan, with per-tier targets table and the 2026-05-13 baseline numbers inlined so the decision record stands alone. 104 tests pass; ruff + mypy clean.

…ording Three small fixes flagged in review of the Phase 0 baseline: (a) Portable manifest. tabvision.eval.manifest_builder now accepts --data-root PATH; render_toml rewrites media/annotation paths that fall under that root as '/<rest>'. The composite-eval CLI already expanded that token via env var or --media-root/--annotation-root, so checked-in manifests are now portable across developer machines. Re-generated tabvision/data/eval/composite.toml with the new flag so the committed manifest no longer carries /home/gilhooleyp/... paths. +3 unit tests covering the rewrite + the no-data-root path. (b) Real SHA in the baseline report. The 'Eval-harness SHA' field in docs/EVAL_REPORTS/composite_baseline_2026-05-13.md now cites 2ec4849 (the commit that landed both the baseline and the chord-cluster matcher fix), instead of the ad-hoc '354571b-matcher-fix' label used at run time. (c) Stale '7-bucket' wording cleared in the planning docs and one test docstring. The implementation is a six-bucket port; only references to the original apr-28 7-bucket harness keep the historical name. Verification ran in WSL: - ruff: passes on changed files. - mypy: clean on the 8 Phase 0 eval source files (parsers/, bootstrap, error_decomposition, composite, manifest_builder). Broader tabvision-wide mypy hits older Phase 5 diagnostics not in this PR's scope. - 107 tests pass across the focused Phase 0 + existing eval suite. No production behavior change; the manifest still resolves to the same 60 player-05 validation clips.

…otstrap CI, error decomposition Lands origin/impl/tab-f1-phase-0 (9 commits): composite.toml eval manifest, guitarset_jams + guitar_techs_midi parsers, bootstrap CI helper, 7-bucket error decomposition, and first per-tier baseline. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… relaxation) SPEC §1.4.1 rewritten to supersede the 2026-05-13 amendment: v1 commits to the original §1.4 per-tier targets (0.94/0.86/0.90/0.82) AND aggregate Tab F1 >= 0.88. The relaxed 0.85/0.90/0.87/0.80 table is withdrawn; the aggregate is un-retired. Keeps the amendment's methodology (public-corpus composite, per-tier bootstrap CIs, lower_95_CI >= target). SPEC §1.4 is now the single source of truth; CLAUDE.md notes the commitment and the design doc D1/D2 are bannered as historical. Honest framing retained in-spec: single-line tier must go 0.51 -> 0.94; a stretch goal adopted as the gate, not a forecast. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add an 'egdb' subcommand to scripts.acquire.datasets mirroring the roboflow pattern: downloads from the author-granted access URL (--url / $EGDB_DOWNLOAD_URL), optional SHA-256 verify, zip/tar extract, idempotent. No URL/data is hard-coded or committed. LICENSES.md flips EGDB to author-granted eval-use (2026-06-01), eval-only, not redistributed, not a shipped-weight substrate. .env.example gains EGDB_DOWNLOAD_URL. ACTION REQUIRED (user): drop in the grant URL to run it, and file the grant email under docs/ + log in docs/DECISIONS.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… AGENTS.md Remove abandoned multi-agent dev experiment (.claude-agent-farm.json, tabvision_agent_farm_config.json, tabvision_agent_farm_prompt.txt, tabvision_agent_config.json, tabvision_prompt.txt) and the stale coordination/ work queue (referenced frozen v0 paths). Remove stray combined_typechecker_and_linter_problems.txt. Banner tabvision_specification.md as historical/non-canonical (SPEC.md is canonical; still linked from AUDIT/README so kept, not deleted). Track AGENTS.md (Codex sibling of CLAUDE.md). All recoverable via git history. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Verified 2026-06-01 against the project page (https://ss12f32v.github.io/Guitar-Transcription/): EGDB audio is a *public* Google Drive folder; access is open and the *license* was the only gate (repo has no LICENSE file -> author's portfolio-use grant on record clears it). - egdb acquirer now defaults to the public Drive folder and downloads via gdown (folder-aware), with a clean manual-download fallback when gdown is absent. Direct-archive path kept for mirrors. - LICENSES.md / .env.example corrected: access-open, license-is-the-gate; EGDB_DOWNLOAD_URL is now an optional mirror override, not a required secret. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… scanner, runbook Wires the cross-dataset prior-generalization check to run locally on CPU: - scripts.acquire.datasets gains 'guitarset' (mirdata → the layout scan_guitarset/composite.toml expect) and 'guitar-techs' (Zenodo record 14963133 via the public API, no hard-coded filenames; prints the tree to verify layout). Both CC-BY-4.0, eval-only, idempotent. - Implements the stubbed manifest_builder.scan_guitar_techs: pairs 6-track MIDI with same-stem/prefix-stem audio (DI/clean preferred), tier=clean_electric (the tier GuitarSet can't cover + the #2 cross-dataset target), performer split, skips stretch-technique clips. Layout inferred from arXiv:2501.03720 — flagged to verify against the first real download. - test_scan_guitar_techs.py pins the heuristics on a synthetic tree (runs under pytest or as a plain script; validated here without the dep). - docs/plans/2026-06-02-tab-f1-phase-0-local-run.md: turnkey runbook (install → acquire → build manifests → prior on/off → read the verdict). - LICENSES.md: Guitar-TECHS row → acquirer/scanner landed, eval-only. #3 fine-tune stays on free GPU (no CUDA locally). EGDB folds in a 4th tier later. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The acquirers printed Unicode arrows/ellipses/em-dashes; on a Windows cp1252 console print() raised UnicodeEncodeError on U+2192 before mirdata ran, killing the guitarset download. Replace ->/.../- with ASCII. Run acquirers with PYTHONUTF8=1 as belt-and-suspenders (also shields third-party console output). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

mirdata download() pulled all partitions (~10GB incl. 3.36GB hex-pickup zips + mix) but the composite eval reads only annotation/*.jams + audio_mono-mic/*_mic.wav. Pass partial_download=['annotations','audio_mic']; harden idempotency to require both annotation jams AND mono-mic wavs (so a partial leftover won't false-skip). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Verified against Zenodo record 14963133: clips are <Pn_category>/midi/midi_<content>.mid paired with <Pn_category>/audio/<capture>/<capture>_<content>.<ext>. MIDI and audio share the <content> token, NOT a prefix — the inferred prefix-matcher would have found ZERO clips. Now: pair by content token scoped to the Pn_category group, prefer direct-input over mic'd amp, performer split from the 'Pn'/'playerNN' prefix, skip __MACOSX cruft + stretch-technique paths. Validated on the real partial download (58 clips paired correctly). Test rewritten to the real layout. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The whole-dir idempotency false-skipped any partial download, and one network blip (mid P1_scales.zip over VPN) aborted the entire multi-GB fetch. Now: skip per-file when the extracted dir already exists (re-run resumes), drop partials and continue past a failed file instead of aborting, and handle corrupt zips. Re-running the command now completes only the missing categories. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Four local CPU eval reports + cross-dataset summary + DECISIONS entry. GuitarSet acoustic reproduces the +22pp prior lift (single 0.219->0.508, strummed 0.475->0.671, onset/pitch ~0.93). Guitar-TECHS electric: prior lift +1.3pp (within 95% CI), onset/pitch collapse to 0.75/0.73. Dominant finding: the highres acoustic backbone doesn't generalize to electric, capping Tab F1 ~0.12 and blocking the SPEC clean/distorted-electric tiers. Next step pivots from a GuitarSet-only fine-tune to evaluating an electric-capable backbone. (Machine-local manifests with absolute paths not committed — harness _relativize_to_data_root has a Windows-separator bug; gitignored + flagged.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… help electric highres-fl was dead code — it passed instrument='guitar_fl', but the pinned hf_midi_transcription only knows saxophone/bass/guitar/piano. guitar-fl.pth does exist in the HF repo, so load it by passing the full repo/file path as checkpoint_path (instrument='guitar' for the architecture). Verified end-to-end. Result (paired, 12 Guitar-TECHS chord clips): guitar_fl ~= guitar_gaps on electric (pitch 0.687 vs 0.679, onset 0.715 vs 0.732 — within noise). The cheap checkpoint swap does NOT close the electric gap; both ~0.68 pitch vs ~0.93 acoustic. Electric needs fine-tuning on electric data. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Decision: train a SEPARATE guitar-electric checkpoint (fine-tuned from gaps), routed by the declared tone — avoids catastrophic forgetting of the acoustic 0.93; the architecture already routes by checkpoint (highres vs highres-fl). Honest blocker captured: no highres training code in-repo or in the inference packages (audio_finetune.py is a scaffold; the 2026-04-24 design targets Basic Pitch). Step 0 is standing up the upstream hFT-Transformer/piano_transcription training code. Data (Guitar-TECHS, CC-BY) is on disk; split by performer; free GPU per D6; acceptance = electric pitch F1 0.73 -> >=0.88, acoustic unchanged. Includes a Basic-Pitch fallback path and the highres-electric integration steps. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Evidence-based scope (DECISIONS 2026-06-02): clean-electric measured 0.12 (acoustic-trained backbone, no in-repo training code), so the electric tiers move to v2 — delivered as a SEPARATE highres-electric checkpoint routed by the declared instrument (avoids catastrophic forgetting of the acoustic 0.93; the architecture already routes by checkpoint). - backend.py registers highres-electric; highres.py adds the guitar_electric variant guarded by TABVISION_HIGHRES_ELECTRIC_CKPT (fails fast with a clear message until the v2 checkpoint is trained). - pipeline.audio_backend_for_session() routes electric -> highres-electric; run_pipeline(audio_backend_name='auto') enables the toggle. Acoustic untouched. - tests/unit/test_audio_routing.py (routing + guard). - SPEC §1.4.1 + CLAUDE.md: v1 = acoustic tiers (0.94/0.86) + aggregate 0.88; electric deferred to v2 with the toggle shipped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Diagnosed the single-line gap (docs/EVAL_REPORTS/acoustic_single_line_2026-06-02.md): the loss is 322 wrong_position_same_pitch vs 8 pitch_off — audio can't resolve which STRING a (correct) pitch was played on. Melodic prior regresses it; hand-position continuity (POSITION_SHIFT_COST 0.05 -> 2.5, now the default + env knob) gives a real but small lift (single 0.508->0.523, strummed 0.671->0.676, no regression) and does NOT reach 0.94. Single-line is information-limited. SPEC §1.4.1 + CLAUDE.md: honest audio-only v1 targets — single-line >= 0.45, strummed >= 0.60, aggregate >= 0.55 (lower_95 >= target); the 0.94/0.86 become the v1.1 video-assisted reference (video resolves the string ambiguity). DECISIONS records the evidence chain so the dead ends aren't re-ground. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The old prefix check hard-coded a forward slash, so on Windows (backslash absolute paths) it never matched and leaked absolute drive paths into checked-in manifests. Switch to Path.relative_to + as_posix, separator-correct on the native platform, always emitting forward-slash TABVISION_DATA_ROOT tokens. Adds a PureWindowsPath regression test exercising Windows behaviour from POSIX CI. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Pre-existing Phase 0 files were committed unformatted and failed CI's ruff format --check. Mechanical formatting only; no behaviour change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel · 2026-06-03T13:06:48Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
tab_vision	Ready	Preview, Comment	Jun 3, 2026 1:07pm

Patrick Gilhooley and others added 26 commits May 19, 2026 14:25

chore(eval): re-point baseline report SHA to post-rebase 9a7e957

1dc3c87

style: ruff format eval module + tests

d96d760

Pre-existing Phase 0 files were committed unformatted and failed CI's ruff format --check. Mechanical formatting only; no behaviour change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel Bot deployed to Preview June 3, 2026 13:07 View deployment

pgil256 merged commit 262c02c into main Jun 3, 2026
4 checks passed

pgil256 mentioned this pull request Jun 3, 2026

Phase 0: per-tier composite eval + first GuitarSet baseline #11

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Land v1 acoustic: composite eval, acoustic scope + honest targets, fusion continuity#13

Land v1 acoustic: composite eval, acoustic scope + honest targets, fusion continuity#13
pgil256 merged 26 commits into
mainfrom
accuracy/tab-f1-program

pgil256 commented Jun 3, 2026

Uh oh!

vercel Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pgil256 commented Jun 3, 2026

What this lands

Verification (local)

Uh oh!

vercel Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 3, 2026 •

edited

Loading