Add AlphaNum precision/recall/F-score metrics by JakobHavtorn · Pull Request #56 · corticph/bewer

Jakob Drachmann Havtorn (JakobHavtorn) · 2026-05-28T19:02:39Z

Summary

Addresses SR-2628 by adding AlphaNumP / AlphaNumR / AlphaNumF — precision, recall, and F-score over alphanumerical entities, initialisms, and acronyms (via upper-case terms, mixed-case terms, alphanumerical terms, digit-prefixed terms). No vocabulary input required.
Mirrors the existing KTP / KTR / KTF / _KTStats architecture but swaps the trie/vocabulary lookup for a stateless regex helper (bewer.preprocessing.regex_match.match_token_regex) over Token.raw, so detection is case-preserving and works on any dataset out of the box.
Default Unicode-aware pattern (two branches) catches MRI, mmHg, HbA1c, CH3, iPhone, mRNA, ΔG, μM, β2, 5G, 3D, … while correctly rejecting ordinary capitalised words (Patient, Hello, The) and ordinals across English / French / Dutch (1st, 1er, 1e) — no exclusion list. Override via pattern=….
TP/FN/FP plumbing reuses _KTStats's alignment-based classification; uses normalized=False Levenshtein so case mismatches (e.g. ref MRI vs hyp mri) are surfaced as FN + FP rather than silently passing.
148 new tests; full suite still passes (1000 / 1000, 0 regressions); pre-commit clean.

Linear

Introduces AlphaNumP, AlphaNumR, and AlphaNumF — precision, recall, and F-score over alphanumerical entities (initialisms, acronyms, chemical/unit notation, mixed-case medical/brand terms, digit-prefixed entities) detected via a Unicode-aware regex predicate over case-preserving tokens. No vocabulary input required; mirrors the existing KTP/KTR/KTF/_KTStats architecture but swaps the trie/vocab lookup for a stateless regex helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

linear-code · 2026-05-28T19:02:43Z

SR-2628

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Codecov Comments Bot (codecov-commenter) · 2026-05-28T19:04:26Z

Codecov Report

❌ Patch coverage is 92.09622% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.90%. Comparing base (a7727e3) to head (c49376d).

Files with missing lines	Patch %	Lines
src/bewer/preprocessing/regex_match.py	83.72%	3 Missing and 4 partials ⚠️
src/bewer/metrics/alphanum_r.py	87.23%	4 Missing and 2 partials ⚠️
src/bewer/metrics/alphanum_f.py	90.69%	3 Missing and 1 partial ⚠️
src/bewer/metrics/alphanum_p.py	91.48%	3 Missing and 1 partial ⚠️
src/bewer/metrics/_alphanum_stats.py	98.13%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #56      +/-   ##
==========================================
+ Coverage   88.56%   88.90%   +0.33%     
==========================================
  Files          47       52       +5     
  Lines        2782     3073     +291     
  Branches      342      372      +30     
==========================================
+ Hits         2464     2732     +268     
- Misses        237      252      +15     
- Partials       81       89       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Replace the domain-flavoured examples (chemical/unit notation, medical/brand terms) with an explicit description of what the default regex actually matches. The metric is a general predicate over Token.raw, not a domain-specific detector — descriptions should reflect that. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The default tokenizer splits on `-`, so multi-token entities like CT-scan and X-ray were previously only partially detected (just `CT`) or missed entirely (`X` alone fails the length-≥2 rule). Extend the matcher to recognise hyphen-connected runs of tokens as compound candidates: - ALPHANUM_DEFAULT_PATTERN: allow `-` in the body and expand the negative lookahead to reject "ordinary capitalised compounds" (init-cap or lowercase parts joined by hyphens), so Hello-World, up-to-date, e-mail, mother-in-law, state-of-the-art remain rejected while CT-scan, X-ray, T-cell, pre-MRI, non-COVID, 5-HT, vitamin-D, MRI-CT, pre-COVID-19, Hello-MRI all match. - match_token_regex: identify runs of consecutive tokens whose source-text gaps consist of only hyphens, try a compound fullmatch against the joined standardized substring, and return a multi-token slice when it matches. Fall back to per-token matching within the run otherwise. - _AlphaNumStats multi-token slices work naturally with the existing alignment-based TP/FN/FP classification: the slice spans the same alignment ops it would as multiple single-token slices, so any edit inside the compound makes it FN, all-MATCH makes it TP. Tests added: 57 new cases — regex-level (compound match/reject/FP at fullmatch), helper-level (multi-token slicing, fallback, spaces disabling compounds, three-part compounds, mixed text), stats-level (perfect match, case lost, hyphen dropped (still TP), partial loss, spurious in hyp, ordinary capitalised compound stays at 0 entities). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

T-shirt, D-day, A-frame, S-curve are structurally identical to X-ray and are correctly identified by the regex as case-distinctive. Same for shouted words like THE, STOP, NO. They aren't "documented false positives" — they're true positives of the metric as defined ("token contains a case signal"). Reframe accordingly: - Fold T-shirt/D-day/A-frame/S-curve into the compound matches list; remove the separate "documented false positives" compound test. - Fold THE/STOP/NO into the single-token matches list; remove the separate single-token "documented false positives" test. - Strip the apologetic "Note: ..." prefix from AlphaNumP's description on the case-sensitivity caveat (it's a property of the metric, not a limitation framed against the user's expectations). No regex or matching behaviour changes — only test grouping and copy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two behavior changes: 1. Hyphen structure is now enforced strictly. Previously, ref CT-scan vs hyp "CT scan" counted as TP because tokens [CT, scan] were identical on both sides — the alignment is token-level and the hyphen is in the source-text gap. Now: a multi-token compound ref match requires the corresponding hyp tokens to also be hyphen-connected; otherwise FN. Symmetric on the hyp side: a hyp compound with no matching hyphen structure in ref counts as FP (hyphen invented). Plumbed via a new public helper tokens_are_hyphen_connected in regex_match.py and an hyp_token_idx-based check in _AlphaNumStats_._ref_match_classification and fp_alignments. 2. Any token (or compound) containing a Greek letter is treated as an entity, regardless of case. Adds a third branch to ALPHANUM_DEFAULT_PATTERN: (?=.*\p{Greek})[\p{L}\d][-\p{L}\d]*. So α, β, μ, μg, α-helix, β-blocker, γδ now all match. Previously rejected because they had no uppercase letter and no digit; the new branch is independent of those conditions. Tests updated: - test_compound_hyphen_dropped_keeps_tp → test_compound_hyphen_dropped_is_fn - New test_hyp_invents_hyphen_is_fp covering the symmetric FP case - New TestAlphaNumStatsGreekTokens covering single Greek letter, μg, and Greek substitution in hyp - α-helix / β-blocker moved from compound rejects to compound matches - γ-radiation added to compound matches - Single Greek tokens (α, β, γ, δ, μ, Δ, Ω) moved from rejects into a new test_greek_tokens_match block All 1069 tests pass; pre-commit clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

THE, HELLO, STOP, NO, OK all match the regex's case-distinctive rule (≥2 uppercase letters, not init-cap-word shape) but are semantically just shouted ordinary words, not abbreviations. Re-add the test to explicitly label them as documented false positives — making the trade-off legible rather than silently mixed into the main matches list. T-shirt, D-day, A-frame, S-curve remain in the compound matches list (true positives — structurally identical to X-ray, indistinguishable without a vocabulary). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 28, 2026 19:02

Copilot started reviewing on behalf of Jakob Drachmann Havtorn (JakobHavtorn) May 28, 2026 19:02 View session

Copilot AI reviewed May 28, 2026

View reviewed changes

Jakob Drachmann Havtorn (JakobHavtorn) and others added 5 commits May 28, 2026 22:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AlphaNum precision/recall/F-score metrics#56

Add AlphaNum precision/recall/F-score metrics#56
Jakob Drachmann Havtorn (JakobHavtorn) wants to merge 6 commits into
mainfrom
add-alphanum-metrics-sr-2628

Jakob Drachmann Havtorn (JakobHavtorn) commented May 28, 2026 •

edited

Loading

Uh oh!

linear-code Bot commented May 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Codecov Comments Bot (codecov-commenter) commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Jakob Drachmann Havtorn (JakobHavtorn) commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Linear

Uh oh!

linear-code Bot commented May 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Codecov Comments Bot (codecov-commenter) commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Jakob Drachmann Havtorn (JakobHavtorn) commented May 28, 2026 •

edited

Loading

Codecov Comments Bot (codecov-commenter) commented May 28, 2026 •

edited

Loading