Skip to content

Digestion enzyme test az#173

Open
cbielow wants to merge 20 commits into
cbielow:developfrom
TheClaervoyant:DigestionEnzymeTestAZ
Open

Digestion enzyme test az#173
cbielow wants to merge 20 commits into
cbielow:developfrom
TheClaervoyant:DigestionEnzymeTestAZ

Conversation

@cbielow

@cbielow cbielow commented May 27, 2026

Copy link
Copy Markdown
Owner

Description

update of #172
after updating base branch.

However, there are still too many weird changes in here.
@TheClaervoyant
Please rebase your branch or move stuff manually if need be, such that changes here are limited to DigestionEnzyme. ;)

Checklist

  • Make sure that you are listed in the AUTHORS file
  • Add relevant changes and new features to the CHANGELOG file
  • I have commented my code, particularly in hard-to-understand areas
  • New and existing unit tests pass locally with my changes
  • Updated or added python bindings for changed or new classes (Tick if no updates were necessary.)

How can I get additional information on failed tests during CI

Click to expand If your PR is failing you can check out
  • The details of the action statuses at the end of the PR or the "Checks" tab.
  • http://cdash.seqan.de/index.php?project=OpenMS and look for your PR. Use the "Show filters" capability on the top right to search for your PR number.
    If you click in the column that lists the failed tests you will get detailed error messages.

Advanced commands (admins / reviewer only)

Click to expand
  • /reformat (experimental) applies the clang-format style changes as additional commit. Note: your branch must have a different name (e.g., yourrepo:feature/XYZ) than the receiving branch (e.g., OpenMS:develop). Otherwise, reformat fails to push.
  • setting the label "NoJenkins" will skip tests for this PR on jenkins (saves resources e.g., on edits that do not affect tests)
  • commenting with rebuild jenkins will retrigger Jenkins-based CI builds

⚠️ Note: Once you opened a PR try to minimize the number of pushes to it as every push will trigger CI (automated builds and test) and is rather heavy on our infrastructure (e.g., if several pushes per day are performed).

timosachsenberg and others added 20 commits May 23, 2026 13:35
Lower the CLI floor on -bruker:dia_ms2_min_support from 1 to 0 so users
can run frame-aggregated MS2 centroiding without the 3x3 neighbor filter.
With centroid=true and min_support=0, every populated grid cell becomes a
local-maxima candidate instead of being dropped when isolated.

Also short-circuit the denoise step in DIAFrameAggregator when
min_support<=0 to skip the now-useless 8-probe hashmap lookup per cell.

Regression guard added: with the test fixture, min_support=0 retains
~6.7x more centroids (106.5M vs 15.9M peaks) than min_support=1, with
spectra still emitted as IM_CENTROIDED.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
install_library() adds opentims_cpp to the install-tree export
(OpenMSTargets via install(EXPORT)), but the build-tree export()
call in export_macros.cmake uses _OPENMS_EXPORT_TARGETS, which is
only populated by openms_register_export_target().

Without this call CMake errors at generate time:
  export called with target "OpenMS" which requires target
  "opentims_cpp" that is not in any export set.

Matches the pattern used by Evergreen and IsoSpec in their
own CMakeLists.txt files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tchContent path

Two bugs in the FetchContent branch that only manifest on CI (where
opentims is not installed system-wide):

1. BUILD_SHARED_LIBS=true is set in the top-level CMakeLists.txt
   (line 287) before cmake_findExternalLibs.cmake is included.
   Without an explicit override, opentims's CMakeLists.txt would
   produce libopentims_cpp.so instead of libopentims_cpp.a.  The
   shared library lands in _deps/opentims-build/ rather than the
   system library path, so the build-time linker fails with
   "undefined reference to TimsDataHandle" when linking test
   binaries against libOpenMS.so (libopentims_cpp.so's directory
   is not in the test's -L search path).  Fix: save/restore
   BUILD_SHARED_LIBS around FetchContent_MakeAvailable(opentims).

2. target_link_libraries(opentims_cpp PRIVATE sqlite3) links the
   bundled SQLite3 CMake target to the opentims_cpp STATIC library.
   For static library targets, CMake propagates PRIVATE deps into
   the link interface, so the export mechanism requires sqlite3 to
   be in the same export set.  This conflicts with SQLiteCpp's own
   sqlite3 export and causes a CMake generate error.  Fix: provide
   only the sqlite3 include directory (needed for compilation with
   OPENTIMS_LINK_SQLITE_STATICALLY); the sqlite3 symbols are
   resolved at final link time through OpenMS's SQLiteCpp dependency
   (SQLiteCpp PUBLIC-links sqlite3, so libsqlite3.a appears in
   libOpenMS.so's link command after libopentims_cpp.a).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…llback sentinel

FindOpentims.cmake sets INTERFACE_COMPILE_DEFINITIONS to "" on the manually
created UNKNOWN IMPORTED target to signal "found via header/library search,
sqlite linkage unknown".  The previous condition only checked MATCHES "NOTFOUND"
(which CMake sets when get_target_property finds no property), so it never fired
for the empty-string case — skipping the sqlite3 include/link injection for
system installs that lack a CMake config package.

Add OR _opentims_defs STREQUAL "" to catch this case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The centroid-or-copy branch for MS1 frame loading was duplicated across
four call sites (loadDIA_, loadDDA_, loadDIAStreaming, loadFrameMode_).
Extract a static loadMS1Spectrum(frame, spec, config, centroider) that
runs IM-centroiding when configured and falls back to frameToSpectrum()
otherwise. Each call site drops the local do_centroid bool and the
if/else branch.

Pure refactor: no behavior change. Existing BrukerTimsFile_test suite
(13 sections including all DIA/DDA integration paths) passes unchanged.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add three missing entries for changes merged after the 2026-04-15 sync:
- OpenMS#9154: BrukerTimsFile spectra marked as CENTROID
- OpenMS#9148/OpenMS#9147: system opentims install fallback before FetchContent
- OpenMS#9151: Bruker DIA MS2 centroiding without denoising (min_support=0)

Co-authored-by: GitHub Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* refactor: rename DIAFrameAggregator to FrameAggregator

The aggregator is scope-agnostic (no SWATH assumptions inside; the MS2
scan-window filter lives in the outer loop, not the class). Renaming
prepares for reuse from the MS1 aggregation path.

Pure refactor: no behavior change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: parameterize FrameAggregator m/z bin width

Lift MZ_BIN_WIDTH from static constexpr to a constructor parameter.
Default 0.02 Da preserves existing MS2 behavior; MS1 callers will
pass a smaller value (0.01 Da) tuned for timsTOF MS1 mass accuracy.

Pure refactor: no behavior change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: guard FrameAggregator neighbor lookups against uint32 wrap at scan_id=0

finalize() and finalizeCentroided() compute scan_id + ds and mz_bin + dm
with negative offsets. When either axis is 0 and the offset is negative,
the unsigned arithmetic wraps to 0xFFFFFFFF and hits phantom cells.

Latent in MS2 (edge scans rare), but exposed at MS1 density where peaks
at scan_id == 0 are common. Fixed at all five negative-offset lookup
sites via a shared neighborKey() helper that returns std::nullopt on
wrap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: use contributing_frames count for skip_denoise in aggregator loops

Previously skip_denoise was based on the index span (hi - lo), which
could run denoising when neighbor frames existed in the range but
contributed no peaks after the SWATH-window filter. Now the outer
loop counts frames that actually added peaks to the grid, and uses
that as the denoise predicate.

Closes the pre-existing TODO from OpenMS#9151 and prepares the same logic
for reuse in the upcoming MS1 aggregation path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: add FrameCentroider::loadFrame(const double*) overload

The upcoming MS1 aggregation path sums per-peak intensities across
adjacent frames into FrameAggregator::Cell::intensity_sum (double).
For MS1 with N=2 and a 10^9-intensity base peak, the sum reaches
5*10^9 and overflows uint32_t. Internal storage is already float
(ImsPeak::intensity), so the new overload just casts double->float
at the entry point.

The existing uint32_t overload is untouched; MS2 callers see no change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add ms1_n_neighbors, ms1_min_support, ms1_max_rt_distance_sec

Plumb three new knobs for Bruker MS1 frame aggregation through the
Config struct and the FileConverter CLI. The library side is wired
up in follow-up commits; this commit is purely surface-level and
has no runtime effect (defaults 0/0.0 preserve existing behavior).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add DIA MS1 default config regression section

Locks in the "zero behavior change at defaults" guarantee for the
upcoming MS1 frame aggregation feature. Snapshots MS1 spectrum count,
peak count, and cumulative intensity with Config{} defaults, and
independently reproduces with the new knobs explicitly set to 0 /
0.0 — asserts both paths produce identical numbers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add RED test for DIA MS1 frame aggregation

Asserts that ms1_n_neighbors=1 produces an intensity-boosted MS1
output relative to the raw per-frame path. Fails until
loadAggregatedMS1Spectrum is implemented (next task).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: implement MS1 frame aggregation in Bruker DIA loader

Add loadAggregatedMS1Spectrum() helper that builds one aggregated MS1
spectrum per center frame from a sliding window of 2N+1 adjacent MS1
frames, using FrameAggregator with a 0.01 Da bin (finer than the MS2
default). Composes with FrameCentroider for within-frame IM centroiding
when ms1_centroid_mz_ppm/pct are set.

Wires the new helper into loadDIA_ via an explicit branch on
config.ms1_n_neighbors > 0, preserving the per-frame code path when
aggregation is disabled. Also adds FrameAggregator::reserve() to
pre-size the grid bucket array (MS1 grids reach ~300k cells per
center at 0.01 Da bins on dense diaPASEF MS1).

DIA MS1 aggregation test bounds re-tuned to [2.5, 3.1] — MS1 ion
populations are more stable across adjacent frames than MS2 fragments,
so 3-frame aggregation gives close to the naive 3x sum (empirical
2.99) rather than MS2's ~1.28x.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: MS1 frame aggregation for Bruker DDA-PASEF loader

Extend the ms1_n_neighbors dispatch to loadDDA_ and add a DDA-specific
test section validating both the aggregation intensity boost and the
ms1_max_rt_distance_sec cap's ability to truncate the neighbor window
on irregular DDA frame cadences.

The STATUS line reports cap_reduction so future test-runners can
re-tune the cap if the fixture's MS1 cadence changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: MS1 frame aggregation in Bruker DIA streaming loader

Extend the ms1_n_neighbors dispatch to loadDIAStreaming so OpenSwath
consumers see an aggregated MS1 survey when requested. Preserves the
per-center-frame cadence invariant that SwathFileConsumer relies on.

Adds a streaming-specific test section validating spectrum count
equality with the raw MS1 frame count.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: MS1 aggregation composition, min_support, RT cap, FRAME mode

Four new test sections exercising every leaf of the MS1 aggregation feature:

- DIA MS1 aggregation with centroiding test — confirms composition with
  FrameCentroider. Centroider caps output at MAX_CENTROID_PEAKS=10000
  per frame, so on dense MS1 surveys (~600k aggregated peaks/frame) the
  top 10k carry ≈0.32 of total intensity. Bounds [0.1, 1.0] accept this.

- DIA MS1 aggregation min_support test — confirms denoise works:
  drop_frac ≈ 0.52, intensity still 1.8× raw (signal peaks survive).

- DIA MS1 RT cap test — 0.5s cap at N=2 reduces intensity by 80%
  (DIA MS1 cadence ~1.5s; tight cap collapses the window to just the
  center frame, so intensity goes from 5x raw down to 1x raw).

- FRAME mode ignores ms1_n_neighbors test — FRAME export emits one
  spectrum per raw MS1 frame regardless of the aggregation knob.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: warn-and-ignore + partial-config warnings for MS1 aggregation

- loadFrames_: warn when ms1_n_neighbors > 0 in FRAME export mode,
  proceed per-frame (preserves FRAME's "raw frames" contract).
- warnPartialMS1AggregationConfig(): one-shot warnings for three cases:
  - ms1_min_support > 0 with ms1_n_neighbors == 0 (knob inert)
  - ms1_max_rt_distance_sec > 0 with ms1_n_neighbors == 0 (knob inert)
  - 2*N+1 > ms1_frame_count (window clamped at run edges)
  Called from loadDIA_, loadDDA_, and loadDIAStreaming after ms1_frame_ids
  is populated.

Mirrors the existing isCentroidingEnabled partial-config warn pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: CHANGELOG entry for Bruker MS1 frame aggregation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: ms1_centroid_max_peaks knob + per-load cap-hit summary

The FrameCentroider had a hardcoded MAX_CENTROID_PEAKS=10000 cap that
silently dropped up to 68% of intensity on aggregated MS1 surveys
(flagged by Codex review of PR OpenMS#9158). Exposing the cap as a knob and
raising the default to 100000 makes the loss recoverable.

Changes:
- New Config::ms1_centroid_max_peaks field (default 100000).
- New CLI option -bruker:ms1_centroid_max_peaks (default 100000, min 1).
- Plumbed through FileConverter::getBrukerConfig_.
- FrameCentroider::centroid() signature accepts max_peaks parameter;
  default constant renamed to DEFAULT_CENTROID_MAX_PEAKS (=10000).
- FrameCentroider tracks cap-hit count + max dropped-peak intensity
  per load; reportCapSummary() emits ONE summary WARN per load()
  (not one per spectrum). Warning only fires when dropped peaks are
  above noise floor (>200 intensity counts).
- reportCapSummary() called at the end of loadDIA_, loadDDA_,
  loadDIAStreaming, and loadFrames_ level-1 loops.

On the DIA test fixture: at the 100k default the cap is hit on ~97%
of spectra but only noise-floor peaks are dropped, so the summary
warning stays silent. Intensity retention jumps from 32% (at 10k)
to 70% (at 100k).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs,test: address CodeRabbit review comments on PR OpenMS#9158

All four comments were Minor but legitimate:

1. CHANGELOG used an ambiguous "ms1_centroid_mz_ppm/pct" shorthand.
   Spell out both identifiers explicitly, plus the new
   ms1_centroid_max_peaks knob.

2. FRAME mode test locked only spectrum count. A silent content
   corruption (e.g. aggregation partially running but writing to
   per-frame output) would pass. Now also asserts peak count and
   cumulative intensity equality between ref and agg runs.

3. DIA MS1 streaming aggregation test only asserted the cadence
   invariant (meta.nr_ms1_spectra), which is independent of whether
   ms1_n_neighbors is honored at all. Now also runs a non-aggregated
   streaming baseline and asserts total MS1 peaks(agg) > peaks(ref) —
   proof that aggregation actually ran. Also replaces swath_maps.front()
   with a proper m.ms1-flag lookup.

4. -bruker:ms1_min_support accepted any non-negative integer, but in
   a 3x3 grid with center excluded the max possible neighbors is 8.
   Added setMaxInt_(..., 8); values >8 are now rejected at the CLI
   boundary instead of silently dropping every peak.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… (OpenMS#9153)

LinearResamplerAlign is a strict superset of LinearResampler (adds
explicit raster alignment, ppm-spaced rasters, chromatogram support
and a linear-interpolation mode). Delete the narrower LinearResampler
class and redirect its callers (Tutorial_SavitzkyGolayFilter,
ImageCreator, DefaultParamHandlerDocumenter, pyOpenMS binding) to
LinearResamplerAlign.

Also add a runtime warning (verifySpacing_) when the configured
spacing is finer than the minimum distance between adjacent input
points, which can produce spurious peaks below the detector dead time.

Co-authored-by: Timo Sachsenberg <timo.sachsenberg@uni-tuebingen.de>
Co-authored-by: Luis Jacob Keller <noreply@github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…MS#9155)

We want to use the submodule revision hash as the way to connect an
OpenMS release to a version of contrib.  By doing this we make
releasing OpenMS easier because we don't have to tag/release contrib.

Yes, I know this is slower for building on Windows.  But contrib is
going away in the next couple of months.  I put the contrib build in
its own job so that we don't hit run-time limits.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…workaround (OpenMS#9160)

* feat: Bruker DDA native ID + load_ms1 toggle

Two related BrukerTimsFile changes:

1. DDA MS2 native IDs change from "scan=<precursor_id>" to
   "frame=<frame_id> scan=<scan_begin> precursor=<precursor_id>".
   Matches the DIA MS2 convention and carries all three physical
   identifiers (source frame, TIMS isolation-window start, Bruker
   Precursors.Id). Unblocks downstream cross-referencing and is the
   format mzParser/Comet require a workaround for (handled in
   CometAdapter).

2. New Config::load_ms1 bool (default true) + -bruker:load_ms1 CLI
   toggle. When false, MS1 spectra are not emitted to the MSExperiment
   or streaming consumer; applies to loadDIA_, loadDDA_, loadDIAStreaming,
   and loadFrames_ level==1. Partial-config warning fires if
   load_ms1=false combined with ms1_n_neighbors>0 (inconsistent).

Two new test sections:
- "DDA native ID format test" — asserts native ID carries frame=,
  scan=, precursor= components and starts with frame=.
- "Bruker load_ms1=false test" — asserts zero MS1 and unchanged MS2
  count for DDA, DIA, and FRAME export modes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(CometAdapter): workaround mzParser segfault on Bruker DDA mzML

Comet's bundled mzParser (MSToolkit/saxmzmlhandler.cpp) grep-matches
native IDs for "scan=" and uses atoi of the following digits as its
spectrum-index sort key. When Bruker DDA MS2 IDs of the form
"frame=F scan=S precursor=P" are written alongside MS1s that take
the counter path ("frame=F"), scanNum values can collide — and
cindex::compare() in MSToolkit/include/mzParser.h returns true for
equal elements, violating std::sort's strict-weak-ordering invariant.
The resulting UB manifests as a segfault during std::string shuffling
in the sort.

Workaround (upstream fix for cindex::compare should still be filed
against UWPR/Comet separately):

1. Before writing the temp mzML fed to Comet, rewrite every native ID
   to "index=N" (monotonic counter). mzParser routes these through
   its counter path — no sort, no UB. Original ID is preserved as a
   per-spectrum MetaValue.
2. Let Comet run normally; it emits pep.xml with the rewritten IDs.
3. After IM/FAIMS annotation (which operates on the rewritten IDs in
   both pep.xml and exp), translate PSM spectrum_reference values
   back to the original Bruker IDs via a rewritten→original map.

Also: default bruker_config.load_ms1=false when searching MS2. Comet
only needs MS2; loading MS1 just wastes memory and time. Can be
overridden via -bruker:load_ms1=true if the caller needs MS1 available
in `exp` for custom post-processing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: CHANGELOG for Bruker DDA native ID + load_ms1 + Comet workaround

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: keep DDA native ID CV-compliant (MS:1002818)

CodeRabbit review flagged that "frame=F scan=S precursor=P" violates the
PSI-MS CV term MS:1002818 ("Bruker TDF nativeID format"), which mandates
exactly "frame=<FRAME_ID> scan=<SCAN_ID>" — and BrukerTimsFile advertises
that accession at BrukerTimsFile.cpp:1297-1298.

Revert to the strict pattern and move the Bruker Precursors.Id SQL key
out of the nativeID into a per-spectrum MetaValue "bruker_precursor_id".
Downstream code that needs the SQL key reads the MetaValue; the native
ID stays CV-compliant and mzParser-stable.

Test updated to assert the CV pattern (frame= prefix, no precursor= in
ID) and that bruker_precursor_id is present as a MetaValue.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address CodeRabbit review on PR OpenMS#9160

- BrukerTimsFile::transform(): respect config.load_ms1 when computing the
  consumer's expected spectrum count, so setExpectedSize matches what the
  underlying loadFrames_/loadDIA_/loadDDA_ paths actually emit
- CometAdapter: drop the misleading claim that -bruker:load_ms1=true can
  override the forced MS1 skip (the option isn't registered here)
- BrukerTimsFile_test: brace single-line if per OpenMS coding guidelines

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(bruker): correctness fixes from review of PR OpenMS#9160

Three confirmed correctness bugs surfaced during the PR review:

1. BrukerTimsFile::transform() computed the DIA expected-spectrum count
   as counts.ms2 * windows.size(), but each MS2 frame belongs to exactly
   ONE WindowGroup. Actual emit count is the per-group sum
   Σ (frames_in_group × windows_in_group), identical to loadDIA_'s
   total_work formula. With multiple WindowGroups the previous formula
   overcounted by a factor proportional to the number of groups.

2. DDA MS2 nativeID collisions. This reader aggregates all
   PasefFrameMsMsInfo rows sharing the same Precursors.Id (potentially
   across multiple frames) into ONE spectrum — a timsrust-style design
   pwiz does NOT use (pwiz emits per-mobility-scan, where (frame, scan)
   is inherently unique). Under our per-precursor aggregation,
   first_entry->(frame_id, scan_begin) is NOT guaranteed unique across
   distinct precursors (PASEF co-isolation with different IsolationMz,
   multi-cycle precursors colliding on their first entry). Restore the
   trailing "precursor=<P>" token to the nativeID. This matches pwiz's
   combined-mode convention ("merged=N frame=F scanStart=A scanEnd=B"
   under the same MS:1002818 accession) and OpenMS's own DIA path
   ("frame=F windowGroup=G scan=S"). The CV accession MS:1002818 stays;
   Comet/Sage dispatch on nativeID content, not the CV accession, so
   downstream compatibility is preserved. Precursors.Id remains
   duplicated as a typed MetaValue "bruker_precursor_id" for direct
   lookup. Test adds a uniqueness assertion on the full set of DDA MS2
   native IDs.

3. SpectrumNativeIDParser did not recognize MS:1002818 (accession-based
   overload returned -1; content-based overload didn't recognize the
   "frame=" prefix). Downstream tools relying on scan-number extraction
   (SpectrumLookup, USI, PepXMLFile, PercolatorInfile, IDMapper,
   ConsensusMapArrowExport) silently produced -1 or used the bare-int
   fallback for Bruker .d output. Add MS:1002818 to the "scan=" accession
   bucket and "frame=" to the prefix map; both route to the existing
   scan=(\d+) regex which picks up the scan_begin integer.

CometAdapter's existing comments at :725 and :853 already described the
"frame=F scan=S precursor=P" format and are now correct again without
modification. CHANGELOG updated to document the real format and the
downstream parser work. rationale for the nativeID format is documented
inline in BrukerTimsFile::loadDDA_.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ProSE): swap cal_lower/cal_upper to match signed-error convention

ProSEAlgorithm::runCalibrationPass_ was reflecting the calibration
window around zero. Given the (lower, upper) schema established at
ProSEAlgorithm.cpp:1615-1617 (theoretical ∈ [observed - lower,
observed + upper] ⇔ signed error e ∈ [-upper, +lower]), mapping the
intended window [shift - spread, shift + spread] requires:

  cal_lower = spread + shift  // was: spread - shift
  cal_upper = spread - shift  // was: spread + shift

The filter at lines 1624-1625 already uses the correct convention; only
the write-back at 1667-1668 was inverted.

Impact on the Bruker DDA fixture (PR OpenMS#9160's opentims integration test):
on a precursor error distribution biased +3.57 ppm with 6.02 ppm spread,
the bug made FragmentIndex accept e ∈ [-9.59, +2.44] (hard-capped,
opposite-sign) instead of [-2.44, +9.59]. 1% FDR PSMs went from 1441
(buggy) to 4537 (fixed) — uncalibrated baseline is 4373, so calibration
now correctly improves ID yield by ~4% instead of degrading it 3x.

Bug introduced in commit 7445cd6 (PR OpenMS#9132, "asymmetric precursor
tolerance window"). Pre-existing on develop; surfaced because PR OpenMS#9160
added a Bruker DDA integration test that exposes the signed precursor
bias through which this swap becomes visible.

Also adds floor-only PSM-count regression tripwires for the three DDA
pipelines (SimpleSearchEngine, ProSE, ProSE calibrated). Floors sit ~5%
below observed yields, catching real regressions while tolerating small
algorithmic drift. The calibrated floor specifically guards the swap
fix — a regression would drop PSMs back to ~1441 and trigger the floor.
The check is a pure-CMake -P script (file(READ) + string(REGEX MATCHALL)
since file(STRINGS) silently undercounts on large idXML files).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bruker): tighten PSM-count floors to ~0.5% below current

Original floors were ~5% below observed counts, too loose to catch a
subtle regression (e.g., a partial nativeID or IM-annotation break that
drops 2% of PSMs). Tighten to ~0.5%:
  SimpleSearchEngine DDA: 3900 -> 4170  (current 4187)
  ProSE DDA:              4100 -> 4350  (current 4373)
  ProSE DDA calibrated:   4300 -> 4510  (current 4537)

If a legitimate change raises yield, the floor should be raised in the
same commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(ProSE): align calibration-asymmetry assertion with fixed convention

The calibration asymmetry check in the "calibration preserves asymmetric
bias - normal case" section was matched to the previous buggy ordering
(cal_upper > cal_lower for positive bias). After fixing the swap in
runCalibrationPass_, the correct inequality is cal_lower > cal_upper:
under (lower, upper) where signed error e ∈ [-upper, +lower], a positive
bias widens +lower and tightens -upper.

The order-invariant downstream uses (std::min(cal.cal_lower, cal.cal_upper)
on line 1299) are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(ProSE): strengthen calibration asymmetry-case assertions

Previous version used tautological checks (cal_lower < 20.0 is trivially
true because the code does std::min(raw, 20.0)) and a way-too-loose 5 ppm
tolerance on the shift, so it couldn't actually catch a regression of the
sign convention. Replace with:

- Direction check: shift > 0 and spread > 0 for this positive-biased fixture
- |shift| < spread precondition stated explicitly
- Load-bearing functional identities binding the writeback mapping:
    cal_lower = spread + shift  (upper endpoint of signed-e window)
    cal_upper = spread - shift  (|lower endpoint| of signed-e window)
  These directly encode the (lower, upper) <-> [-upper, +lower] convention
  and would fail simultaneously if the swap bug were re-introduced.
- Asymmetry check (cal_lower > cal_upper for positive bias) as a cross-check
  against mislabeled shift/spread producing consistent-but-wrong identities.
- Keep the "< 20, < 30" checks only as a witness that the std::min cap is
  inactive (so the identities above are unconstrained), NOT as primary
  behavior assertions.

Dropped the hardcoded shift=7.0 / spread=8.5 / cal_lower=15.5 / cal_upper=1.5
literals: the search pipeline doesn't include every synthetic shift in the
calibration set, so those specific numbers depend on search-side filtering
details. The functional identities are immune to that.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add entry for contrib directory converted to git submodule (OpenMS#9155):
developers and packagers building from source now need
'git submodule update --init' to obtain vendored third-party libraries.

Co-authored-by: GitHub Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat(ProSE): add -out_pin for Percolator input file output

Registers -out_pin <files> (per-input, like -out_idxml) that writes
Percolator .pin/.tsv files via PercolatorInfile::store(). Works
independently of -percolator_executable — users can produce .pin
files for external rescoring without running Percolator from within
ProSE.

The search algorithm already populates the extra_features metavalue
on ProteinIdentification::SearchParameters with the correct feature
set (score, fragment_error_median_ppm, matched_prefix/suffix_ions_
fraction). This commit just wires it through the TOPP tool's CLI
and output loop.

Closes item 4 in OpenMS#9108.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ProSE): mark -out_pin as advanced parameter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ProSE): address CodeRabbit review on -out_pin (OpenMS#9166)

Three bugs found by CodeRabbit, all in the .pin output block:

1. (Critical) min_charge parsed via String(sp.charges).toInt() on the
   full "2:5" string — throws ConversionError, silently preventing
   .pin output on every run. Fixed: split on ':' for both min and max.

2. (Major) Duplicate "score" column — hardcoded "score" + extra_features
   (which already contains "score") → malformed .pin header. Fixed:
   only insert "score" if not already present in extra_features.

3. (Minor) result.protein_ids.front() without empty guard. Fixed: check
   empty and log error before accessing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ProSE declared ions:add_a/b/c/x/y/z_ions parameters in its constructor
but updateMembers_() never read them. FragmentIndex received and used
them correctly (via setParameters in prepareContext), but
TheoreticalSpectrumGenerator in the scoring loop always used its
defaults (b/y only) — making the index and scorer inconsistent.

Fix: read the 6 ion toggles in updateMembers_() and forward them to
TheoreticalSpectrumGenerator alongside the existing add_first_prefix_ion
and add_metainfo settings. Users can now configure ETD-style searches
with -Search:ions:add_c_ions true -Search:ions:add_z_ions true
-Search:ions:add_b_ions false -Search:ions:add_y_ions false and have
both the index and scoring honor the setting.

Closes item 12 in OpenMS#9108.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…y be A-Z, if an unknown character is recorded, it throws an Exception.
…EnzymeProtein. Created also Tests and whatnot.
@github-actions

Copy link
Copy Markdown

@cbielow The following names might be missing from the AUTHORS file:

  • Alen Saric

Please consider applying the following patch:

--- AUTHORS	2026-05-27 14:23:47.953841885 +0000
+++ AUTHORS.new	2026-05-27 14:23:52.817029772 +0000
@@ -6,6 +6,7 @@
 authors contributing to a specific piece of code, please refer to
 the authors tag in the respective file header.
  - Achal Bajpai
+ - Adithya Kammati
  - Aditya Muzumdar
  - Ahmed Khalil
  - Alen Saric
@@ -59,6 +60,7 @@
  - Johannes Veit
  - Johannes von Kleist
  - Johan Teleman
+ - Jonas Scheid
  - Joshua Charkow
  - Juliane Schmachtenberg
  - Julianus Pfeuffer
@@ -73,6 +75,7 @@
  - Lenny Kovac
  - Leon Bichmann
  - Leon Kuchenbecker
+ - LuBox2K
  - Lucia Espona
  - Luis Jacob Keller
  - Lukas Mueller

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants