Add nearest-RT FID/MS matching for split-GC (no retention-index step)#3
Open
niklashoelter wants to merge 2 commits into
Open
Add nearest-RT FID/MS matching for split-GC (no retention-index step)#3niklashoelter wants to merge 2 commits into
niklashoelter wants to merge 2 commits into
Conversation
Split-GC runs split one injection post-column to both MS and Polyarc-FID, so FID and MS peaks share a retention-time axis and no longer need retention-index (RI) matching via an alkane standard. Add an rt-matching mode alongside the existing ri-matching: - Injection.match_rt(rt, func, tolerance): nearest-retention-time analogue of match_ri. func maps the source (MS) rt to the expected (FID) rt (identity by default); tolerance defaults to 1 s. Excludes the internal standard and returns the real FID peak. - Analysis: matching='ri'|'rt' option (default 'ri', back-compat) wired through calc_plate_yield/calc_plate_conv, with rt_func/rt_tolerance. The height-ratio tie-break is reused for both modes. Plate reshape is now layout-driven so non-8x12 plates (split-GC A1..K3) work. - Analysis.constant_offset(b)/linear_drift(a,b): rt_func factories (t_fid = t_ms + b; t_fid = a*t_ms + b). constant_offset(0) is the default. - examples/split_gc/plate_processing.py (+ splitgc_layout.csv): new example on the same FBS-FB-021-ALL.rslt data using matching='rt'. - Remove stray DEBUG print in RI_Calibration. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Validated examples/split_gc/plate_processing.py end-to-end on the real FBS-FB-021-ALL.rslt sequence (33 wells, A1..K3). - pygecko/reaction/array.py: add Product_Array.get_product(pos). The plate engine (Analysis.__match_and_quantify) looks up the well analyte via layout.get_product(pos), which previously existed only on Reaction_Array. Product_Array stores product SMILES directly, so get_product returns the well entry; the analysis engine now works for product-only layouts. - examples/split_gc/plate_processing.py: set RT_FUNC to Analysis.constant_offset(0.010). The measured FID-MS offset is a consistent +0.0094 mean / +0.010 median min (FID elutes later, matching the ~0.01 min splitter dead-volume). The offset centres the +-1 s match window and recovers two tolerance-edge wells (B3, F1) -> 29/33 matched. The 4 remaining misses are MS-side (E-row Decanenitrile has a weak/absent EI molecular ion; I2 a single-injection MS miss), not nearest-RT matching. - Add validated reference outputs (yields CSV + plate heatmap PNG).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
For the new split GC (one injection split post-column to both the MS and the Polyarc-FID), the FID and MS traces come from a single run and share one retention-time axis, so retention-index (RI) bridging — and its alkane-standard requirement — is unnecessary. This branch adds nearest-retention-time matching as a sibling to the existing RI path and validates it end-to-end on real data.
RI matching is kept as the default for the legacy two-machine workflow; nothing in that path changes.
What changed
Injection.match_rt(rt, func, tolerance)— RT analogue ofmatch_ri.funcmaps the source (MS) retention time to the expected (FID) retention time (identity by default; a drift model plugs in later without touching the matcher), tolerance defaults to 1 s, the internal standard is excluded up front, and the real FIDPeakis returned.Analysisgainsmatching='ri'|'rt'(default'ri', fully back-compatible) oncalc_plate_yield/calc_plate_conv, withrt_func/rt_tolerance. The height-ratio tie-break is reused for both modes, and the plate reshape is now layout-driven so non-8x12 plates (e.g. the A1..K3 split-GC sequence) work.Analysis.constant_offset(b=0.0)andAnalysis.linear_drift(a, b)factories forrt_func.Product_Array.get_product(pos)so the plate engine works with product-only layouts (it previously requiredReaction_Array).examples/split_gc/plate_processing.py(+splitgc_layout.csv) on the FBS-FB-021-ALL.rslt data, usingmatching='rt'.Validation (real FBS-FB-021-ALL data, 33 wells A1..K3)
RT_FUNC = Analysis.constant_offset(0.010), which centres the +-1 s window and recovers two tolerance-edge wells -> 29/33 matched.match_molcannot find it, plus one further single-injection MS miss.How to switch modes
Analysis.calc_plate_yield(ms_seq, fid_seq, layout, matching='rt', rt_func=Analysis.constant_offset(0.010), rt_tolerance=1/60); omit / usematching='ri'(withri_tolerance) for the legacy path.fid_injection.match_rt(ms_peak.rt, func=..., tolerance=...)in place ofmatch_ri.