Skip to content

convert: Brainstorm-E split migrators (treatment→manipulation, ontology_table_row→observations)#149

Merged
stevevanhooser merged 28 commits into
V2from
claude/optimistic-ritchie-f06jvs
Jul 4, 2026
Merged

convert: Brainstorm-E split migrators (treatment→manipulation, ontology_table_row→observations)#149
stevevanhooser merged 28 commits into
V2from
claude/optimistic-ritchie-f06jvs

Conversation

@stevevanhooser

Copy link
Copy Markdown
Contributor

#2 of the migration work — the migration code for the Brainstorm E split, implementing the specs added to did-schema (V_epsilon/conversions/from_did_v1/treatment.md + new ontology_table_row.md). Companion schema PR: Waltham-Data-Science/DID-schema#62.

Engine (v1_to_v2)

  • New TargetVersion option (default 'V_delta'behaviour unchanged).
  • 1 → N support: the per-class migrator output is normalized to a cell of bodies; each is padded (ensureClassBlocks), optionally validated, and counted independently. The historical 1→1 path is preserved exactly.
  • Under TargetVersion='V_epsilon', a class with a split migrator under +did2/+convert/+migrators_e is routed there instead.

Migrators (+migrators_e)

  • treatment.m — 1→1 branch dispatch → injection / bath / procedural_manipulation / temperature_manipulation / environmental_manipulation, with the Dab string_valuetarget_structure edge case and out-of-tier routing (quarantine-with-reason) for non-manipulation rows (DOB, experiment time).
  • ontology_table_row.m — genuine 1 → N: each row → a scalar/categorical observation property class (or a generic escape hatch), value placed in the declaring block (shape mixin for scalars; categorical_concept for inherited categoricals; the concrete class for the two overriders).

Branch/term resolution is a keyword/CURIE-prefix heuristic seed; the authoritative per-term tables are finalized in discovery mode against real corpora (the report-only loop the conversion docs describe).

Tests (testMigratorsE.m)

Synthetic, Validate=false (matching testMigrators): thermal/drug/environmental/Dab/not-a-manipulation treatment cases; ontology_table_row 1→N fan-out + per-row unique ids; and a backward-compat guard that the default target leaves treatment unchanged.

Honest status / what's left

  • ⚠️ Authored without a local MATLAB/Octave — this relies on CI for verification. If anything fails to parse or a test trips, I'll iterate (I'm subscribed to this PR).
  • how will we handle searching other databases? #3 (corpus discovery against V_epsilon) is a deliberate follow-up: the existing corpus tests validate against V_delta/stable, but the E classes live in V_epsilon/draft. To run the corpuses through the E split end-to-end, CI needs DID_SCHEMA_PATH pointed at V_epsilon with the draft E classes reachable by the schema cache (promote to stable, or load draft). That's the one coordinated decision; I can wire an opt-in DID_RUN_EPSILON_CORPUS discovery job once it's settled.
  • The default V_delta pipeline and its green corpus CI are untouched by this PR.

🤖 Generated with Claude Code


Generated by Claude Code

…e_row→obs)

#2 of the migration work — the migration CODE for the Brainstorm E split,
implementing the specs in did-schema V_epsilon/conversions/from_did_v1/.

Engine (v1_to_v2): adds a TargetVersion option (default 'V_delta', behaviour
unchanged) and 1→N support. Under TargetVersion 'V_epsilon', a class with a
split migrator under +did2/+convert/+migrators_e is routed there and may
return several bodies; each is padded, optionally validated, and counted
independently. Fully backward compatible — the default path still runs the
existing per-class migrators 1→1, so the V_delta corpus pipeline is untouched.

Migrators (+migrators_e):
- treatment.m: 1→1 branch dispatch to injection / bath / procedural_ /
  temperature_ / environmental_manipulation, with the Dab string_value→
  target_structure edge case and out-of-tier (quarantine-with-reason) routing
  for non-manipulation rows. Branch resolution is a keyword/CURIE-prefix seed;
  the per-term table is finalized in discovery mode.
- ontology_table_row.m: 1→N — each row → a scalar/categorical observation
  property class (or generic escape hatch), value placed in the declaring
  block (shape mixin for scalars / categorical_concept for inherited
  categoricals / the concrete class for the overriders).

Tests (testMigratorsE.m): synthetic, Validate=false (matching testMigrators) —
thermal/drug/environmental/Dab/not-a-manipulation treatment cases, the
ontology_table_row 1→N fan-out + per-row unique ids, and a backward-compat
guard that the default target leaves treatment unchanged.

Not local-verifiable (no MATLAB in the authoring env) — relies on CI. Corpus
discovery against V_epsilon (#3) is a follow-up that needs the E draft classes
reachable by the schema cache (DID_SCHEMA_PATH → V_epsilon); see PR body.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
Comment thread src/did/+did2/+convert/+migrators_e/ontology_table_row.m Fixed
claude added 5 commits June 19, 2026 11:39
The compressed multi-line if/elseif chain with a trailing end (line 232)
tripped the MATLAB Code Analyzer keyword/end alignment check. Expand to
standard multi-line form; no behaviour change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
Wires the corpus migration test (#3) to run against the V_epsilon E classes,
isolated from the green V_delta corpus run:

- runCorpusDiscovery: add a TargetVersion option (default 'V_delta'),
  forwarded to v1_to_v2.
- testCorpusEpsilon.m: opt-in (DID_RUN_EPSILON_CORPUS) discovery run of the
  Dab corpus through TargetVersion='V_epsilon' (Dab's treatment rows, incl.
  the optogenetic-tetanus "Target Location" idiom, exercise the split).
  Skips cleanly in the default V_delta run via assumeFail.
- test-code.yml: after the default test run, assemble a combined V_epsilon
  stable+draft schema dir (the cache loads one flat dir; tier class names are
  disjoint), point DID_SCHEMA_PATH at it, and run testCorpusEpsilon with
  DID_RUN_EPSILON_CORPUS=1. Own step env, so the V_delta job is untouched.

Discovery mode: quarantine (e.g. the expected missing-time_reference until
synthesis lands) is reported, not failed; only errors fail the step. The
report artifact drives the next iteration of the dispatch/synthesis work.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
Adding options.TargetVersion to the arguments block requires  in the
function signature too (MATLAB MismatchBetweenBlockAndLine). This also broke
the existing corpus tests that call this helper. No behaviour change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
…elta)

Per "replacing V_delta with V_epsilon": the standard test run now validates
against V_epsilon and the corpus discovery migrates through the Brainstorm-E
split, instead of running V_epsilon as an isolated opt-in side step.

- test-code.yml: assemble a combined V_epsilon stable+draft schema dir (the
  cache loads one flat dir; tier class names are disjoint) and point the main
  DID_SCHEMA_PATH at it. Removed the separate opt-in epsilon step.
- runCorpusDiscovery: default TargetVersion is now V_epsilon, so all corpus
  tests (Dab, B, JH, PRED, 20211116, Soph) exercise the split as the main test.
- v1_to_v2: stamp document_class.schema_version = V_epsilon on outputs when
  TargetVersion is V_epsilon.
- Removed testCorpusEpsilon.m and the DID_RUN_EPSILON_CORPUS gate (redundant).

Non-corpus unit tests are unaffected: they set their own fixtures schema path
and run Validate=false, so they do not read DID_SCHEMA_PATH. v1_to_v2's own
default TargetVersion stays V_delta so the per-class V_delta migrator unit
tests (testMigrators/testConvertV1ToV2) remain valid; corpus migration opts
into V_epsilon via runCorpusDiscovery.

Discovery mode: quarantine (e.g. the expected missing-time_reference until
synthesis lands) is reported, not failed. Local-dev default schema path
(cache defaultSchemaPath) is unchanged; a multi-tier cache loader is the
follow-up that removes the assemble-a-dir step.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
…ore empty time_reference)

The Brainstorm-E split migrators now attach a real time_reference instead of an
empty slot (the prior cause of quarantine for behavioral/treatment rows):

- Each migrated interaction depends_on a session_relative_reference document
  (ordinal, relation='during', anchored to the source's base.session_id) minted
  with did.ido.unique_id(). 'during' is the honest universal fallback (the event
  happened within the session); 'at_end_of' is reserved for known-terminal
  cases, not asserted blanket.
- treatment: now 1 -> 2 (manipulation + anchor). ontology_table_row: 1 -> N+1
  (N observations sharing one session anchor).
- Fixes invalid row ids: ontology_table_row observations now get fresh
  did.ido.unique_id() base ids (the previous "<id>_row%02d" was not a valid
  did_uid and would have quarantined).

Updates testMigratorsE expectations for the added anchor doc (+ asserts the
anchor relation is 'during'). Authored without local MATLAB — CI-verified.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
Comment thread src/did/+did2/+convert/+migrators_e/ontology_table_row.m Fixed
Comment thread src/did/+did2/+convert/+migrators_e/treatment.m Fixed
claude added 3 commits June 20, 2026 00:10
- treatment.m: drop the stale %#ok<AGROW> suppression (the line isn't in a
  loop, so no AGROW warning is generated).
- ontology_table_row.m: remove the now-unused rowIndex argument threaded
  through migrateRow/makeScalarObservation/makeCategoricalObservation/
  startObservation (row ids now come from did.ido.unique_id()).

No behaviour change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
…s report

From the V_epsilon corpus discovery (B 0, 20211116 275, Dab 7859, JH 41504):

1. ontology_table_row "has no rows to migrate" (~47K, the dominant cause):
   the real v1 layout is parallel char fields (names / variable_names /
   ontology_nodes / data), not a 'rows' array, so extractRows found nothing
   and threw. Now falls back to migrating the document unchanged as an
   ontology_table_row (the class still exists in V_epsilon, so it validates)
   instead of quarantining. Splitting that char-field layout into per-row
   observations is a follow-up.

2. target_structure "must be a struct (named composite type ontology_term)"
   (~105): the field is an ARRAY of ontology_term (a struct array), but the
   migrators emitted a cell array. Emit struct arrays
   (empty: struct('node',{},'name',{}); populated: struct('node',..,'name',..))
   in both treatment.m and ontology_table_row.m.

Remaining quarantine is mostly non-E drift (stimulus_bath mixture,
stimulus_response_scalar_parameters_basic block, subject_group has no V_epsilon
class) plus a handful of unresolved-branch treatments (curator review).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
…olumn observations

extractRows now parses the actual v1 ontology_table_row form (per the schema):
parallel comma-separated names / variable_names / ontology_nodes plus a data
struct keyed by variable_names. One document is one table ROW; each COLUMN
becomes one observation (identity = ontology_node + name; value = data.<var>).
Columns with no usable value (missing key, [], '', NaN) are skipped. The
synthetic rows-array / single-row shapes still work; the unparseable-fallback
(migrate unchanged as ontology_table_row) remains for anything else.

This converts the ~47K "has no rows to migrate" quarantine into actual
per-column observations (the dominant E-split gap from the corpus report).

Adds testMigratorsE cases for the char-field split and empty-value skipping.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
Comment thread src/did/+did2/+convert/+migrators_e/ontology_table_row.m Fixed
Comment thread src/did/+did2/+convert/+migrators_e/ontology_table_row.m Fixed
claude added 4 commits June 20, 2026 14:49
No .codespellrc to ignore-list, so rename the variables. No behaviour change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
The extractRows rewrite's old_string didn't include the original function's
closing end, leaving a duplicate end after splitCSV (Code Analyzer parse error
at END, line 290; the 'function might be unused' alert was its downstream
symptom). Removed the extra end; the file is balanced (the apparent off-by-one
in keyword counts is the arguments block).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
The dominant corpus quarantine ("No schema file for class
generic_categorical_observation" 41,095 in JH; session_relative_reference;
generic_scalar_observation.value placement) was not migrator bugs — the
workflow checked out did-schema at ref: main, where the Brainstorm-E classes
do not yet exist. They live on the matching did-schema branch (DID-schema#62).

Point the did-schema checkout at ${{ github.head_ref || 'main' }} so the
V_epsilon corpus discovery validates against the actual E schema. Revert to
main once DID-schema#62 lands.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
stimulus_bath migrates to a `bath` (pharmacological_manipulation), but the bath
needs its subject (the stimulator element's subject) and an epoch_bounded_
reference time anchor (the stimulator's epoch) -- both require following
stimulus_element_id into the session/element graph. A manipulation must be
emitted complete, so the whole bath is assembled in ndi.migrate.local, not
per-document. The per-doc converter now defers via
did2:convert:needsSessionContext, so the discovery report reads "migrate via
ndi.migrate.local" instead of the misleading "mixture missing" (the prior
V_delta-block-placement fallback). Adds a test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
Comment thread src/did/+did2/+convert/+migrators_e/stimulus_bath.m Fixed
Comment thread src/did/+did2/+convert/+migrators_e/stimulus_bath.m Fixed
claude added 13 commits June 21, 2026 22:09
…ays-error pattern

The prior 'error then unreachable assignment' tripped 'return value might be
unset' + 'input unused'. Use ~ for the ignored input and assign the required
output before the error (suppressing the dead assignment), so the stub is
analyzer-clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
isAlreadyVDelta -> isAlreadyTarget(body, targetVersion): a body already tagged
with the configured TargetVersion (V_delta or V_epsilon) short-circuits the
migration loop and is just padded (ensureClassBlocks) + validated, not
re-migrated. Default 'V_delta' preserves existing behaviour exactly.

This is what lets ndi.migrate.local feed NDI-assembled V_epsilon bodies (the
bath + epoch_bounded_reference from stimulusBathToBath) back through v1_to_v2
to get the superclass chain rebuilt and validated. Adds a short-circuit test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
Follows the V_epsilon schema flatten (categorical_concept.value now
placement: concrete_class). ontology_table_row's categorical path no
longer branches on whether the class is an overrider; it writes the
bound term uniformly into body.(className).value. This clears the
~47K "categorical_concept.value missing" corpus quarantine, since the
inherited value now lives in each observation's own block.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
Clears the JH subject_group corpus quarantine (353 docs): subject_group
is deprecated in V_epsilon, so the per-document E migrator folds it into a
subject flagged is_group (optional legacy group_name/description map to
local_identifier/description). Membership -> group_assignment is
relational and stays an NDI-layer follow-up (like stimulus_bath -> bath).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
20211116 was the lone corpus still migrating to V_delta via bespoke test
code. Convert it to the shared did2.unittest.helpers.runCorpusDiscovery
driver (same as B / Dab / JH), which targets V_epsilon and reuses the
common schema-path install/teardown and report writer. This puts every
corpus on the same V_epsilon target so its remaining quarantine is
measured on the same footing as the others.

Discovery mode is unchanged (no zero-quarantine assertion); the report is
the deliverable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
v1 documents carried a property block for every class in their
hierarchy. V_epsilon introduces abstract / fieldless parent classes
(new -- the previous schema had none), which contribute no block to an
instance. The inherited empty block (e.g. stimulus_response_scalar_
parameters: {} on a stimulus_response_scalar_parameters_basic doc) rode
through migration untouched and tripped the strict undeclared-top-level-
block validator.

In ensureClassBlocks, after manufacturing the contributing blocks, remove
any top-level block that names a chain class which contributes no block
AND is an empty struct. Only EMPTY blocks are dropped; a non-empty one
signals real data a migrator must place, so it is left to fail loudly
rather than be silently discarded. General fix (one place) rather than a
per-class migrator, so every v1 abstract-parent-empty-block case clears.

Regression-checked by the 20211116 corpus discovery job (its 273
"undeclared top-level block stimulus_response_scalar_parameters" should
drop to ~0).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
…r; enable Soph corpus

Three remaining deprecated-class folds (no migrator existed, so they would
quarantine under V_epsilon):
- treatment_drug   -> injection (kind: drug);  mixture from mixture_table
- virus_injection  -> injection (kind: virus); virus+dilution in the mixture
- treatment_transfer -> biological_transfer;   recipient->subject_id,
  donor carried as donor_id, method->procedure/kind
Each is a 1 -> 2 emit (the manipulation + the shared session_relative_
reference anchor subject_interaction requires) and uses the correct
{chemical, amount} mixture record shape. Tests added to testMigratorsE.

Also enable the large Soph corpus (~446 MB, ~101k docs) in test-code.yml
(DID_RUN_SOPH_TEST=1) so its V_epsilon quarantine is surfaced for the
discovery pass; this is opt-in and meant to be gated back off once Soph
coverage is characterised.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
Soph discovery is characterised (0 quarantine at V_epsilon), so drop the
DID_RUN_SOPH_TEST=1 enable from test-code.yml -- the ~446 MB / ~101k-doc
download should not run on every push. The test remains opt-in (skips
cleanly) and can be re-enabled on demand.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
The treatment -> injection/bath builders predated schema validation
(treatment isn't in the validated corpora) and had three mismatches that
would quarantine the moment a real treatment-drug/bath row appeared:
- mixture records used {agent, concentration}; the schema is
  {chemical, amount}.
- injection/bath were given a `notes` field neither class declares
  (strict undeclared-field failure).
- injection omitted its required `volume`.

Align both builders with the injection/bath schemas: mixture as
{chemical, amount}, injection carries kind/volume(blank)/route/
target_structure, bath carries kind/location. notes is dropped (no field
on these classes). Matches the {chemical, amount} shape the new
treatment_drug / virus_injection migrators already use.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
…ting)

To curate the authoritative treatment / ontology_table_row routing tables
from real data, the discovery run now writes corpus-reports/<corpus>-
routing.json: for every migrated observation/manipulation, the identity
term (measured_property / applied_property / procedure / factor / entity /
first mixture chemical) aggregated to its routed class with counts. This
makes the heuristic routing AUDITABLE -- terms landing in generic_* are
unmatched, and a term under a surprising class is a mis-route. Best-effort
and wrapped so it never breaks the discovery summary.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
The legacy did1 validator derived a document's superclass NAMES solely
from each superclasses entry's `.definition` path
($NDIDOCUMENTPATH/base.json -> base). did2-form documents (the shape every
v1->v2 migrator now emits, and increasingly what NDI writes) carry the
name directly as `.class_name` with no `.definition`, so the extraction
errored to empty and EVERY such document failed the superclasses check
with `("base" <=> "")` -- e.g. the NDI dataset/session ingestion suite on
the Vnext line.

Accept both forms: keep the `.definition` derivation, and when that
yields nothing, fall back to `.class_name`. Bound the deeper superclass
recursion by `superFullNames` (the `.definition` paths) so did2-form docs
-- which expose names but no paths -- skip the path-based recursion
cleanly instead of indexing past the end; the concrete class is still
validated and the superclass-name check has already passed.

Verified via the NDI dataset/session ingestion tests (they exercise this
legacy path); did2's own suite does not reach it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
The per-term routing report ships as a JSON artifact, but the artifact
isn't readable from the migration tooling we use to curate. Echo the
actionable breakdown to stdout so it lands in the CI log: every term
routed to a generic_* escape hatch (the UNMATCHED ones that need a minted
class or a routing rule), then the top routes overall. Data source for
curating the authoritative treatment / ontology_table_row routing tables.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
After the 1->N splits and class folds, run did2.validate.references over
the migrated batch and report dangling depends_on edges (orphan_count /
edges_examined, aggregated by source-class.edge_name). Surfaces any
reference the migration would break -- e.g. a split that didn't preserve a
referenced id, or an edge to a deferred/quarantined doc. Best-effort,
non-fatal (discovery mode); echoed to the CI log next to the routing
inventory.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
claude added 2 commits June 23, 2026 12:44
…er")

Codespell read the catch variable `refErr` as a misspelling of "refer".
There's no .codespellrc ignore-list in the repo, so rename it (matching
the already-clean routingErr style). No behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
…-routes (#1)

The routing inventory exposed heuristic false positives from substring
matching: "average amplitude"->age_observation ("aver-AGE-"), C. elegans
"encounter*"->litter_size ("en-COUNT-er"), "sampling rate"->heart_rate,
"number of samples"->litter_size, "growth duration"/"latency"->age,
"bacterial patch volume"->organ_volume.

Two changes:
- containsAny now matches at WORD BOUNDARIES (regexp \<...\>), so a keyword
  matches only as a whole word/phrase -- average !-> age, encounter !-> count.
- the scalar/categorical keyword lists are tightened to SPECIFIC,
  high-confidence phrases (body weight, body length, heart rate,
  respiration rate, blood pressure, litter size, cell count, life cycle
  stage, health status, ...) and the over-broad singletons (rate,
  frequency, count, number, volume, duration, latency, score, status,
  stage, behavior) are dropped. respiration_rate and cell_count get their
  own branches. Anything not confidently a known property falls to the
  generic_* escape hatch -- correct for the lab-specific corpus terms.

Existing testMigratorsE cases (weight, life cycle stage) still route as
before. treatment.m is intentionally left stem-based (rear->rearing) and
showed no mis-routes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ABEeDJ83r9Hq9PaiSije8o
@stevevanhooser stevevanhooser merged commit d6b1e17 into V2 Jul 4, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants