Skip to content

Hydrophobicity scales and profiling#1

Open
NoraHeese wants to merge 149 commits into
TheClaervoyant:developfrom
NoraHeese:Hydrophobicity
Open

Hydrophobicity scales and profiling#1
NoraHeese wants to merge 149 commits into
TheClaervoyant:developfrom
NoraHeese:Hydrophobicity

Conversation

@NoraHeese

Copy link
Copy Markdown

Description

Added hydrophobicity scale data by adding the function getHydrophobicity() to the Residue which returns the hydrophobocity value for an amino acid in a given scale.
Added the new class HydrophobicityProfile which is used for calculation hydrophobicity profiles for peptides.
It has functions for calculating GRAVY score, hydrophobicity profiles, windowed hydrophobicity profiles and hydrophobic moments of a peptide.

for Issue OpenMS#9005

Checklist

  • Make sure that you are listed in the AUTHORS file
  • Add relevant changes and new features to the CHANGELOG file
  • I have commented my code, particularly in hard-to-understand areas
  • New and existing unit tests pass locally with my changes
  • Updated or added python bindings for changed or new classes (Tick if no updates were necessary.)

How can I get additional information on failed tests during CI

Click to expand If your PR is failing you can check out
  • The details of the action statuses at the end of the PR or the "Checks" tab.
  • http://cdash.seqan.de/index.php?project=OpenMS and look for your PR. Use the "Show filters" capability on the top right to search for your PR number.
    If you click in the column that lists the failed tests you will get detailed error messages.

Advanced commands (admins / reviewer only)

Click to expand
  • /reformat (experimental) applies the clang-format style changes as additional commit. Note: your branch must have a different name (e.g., yourrepo:feature/XYZ) than the receiving branch (e.g., OpenMS:develop). Otherwise, reformat fails to push.
  • setting the label "NoJenkins" will skip tests for this PR on jenkins (saves resources e.g., on edits that do not affect tests)
  • commenting with rebuild jenkins will retrigger Jenkins-based CI builds

⚠️ Note: Once you opened a PR try to minimize the number of pushes to it as every push will trigger CI (automated builds and test) and is rather heavy on our infrastructure (e.g., if several pushes per day are performed).

github-actions Bot and others added 30 commits March 30, 2026 13:04
Add ProteomicsLFQ entry for PR OpenMS#9030:
- Bruker TimsTOF .d (BRUKER_TDF) input support with Biosaur2 seeding
- IM_PEAK data path with FWHM estimation and skip of incompatible steps
- New Seeding:algorithm parameter

Co-authored-by: GitHub Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…penMS#9039) (OpenMS#9042)

BOOST_PROCESS_USE_STD_FS was defined unconditionally but only exists
since Boost 1.78. On older versions (e.g. 1.74), boost::process still
uses boost::filesystem internally, causing undefined reference errors
because Boost::filesystem was never linked.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The minimum Boost version was bumped from 1.74 to 1.78 in commit
742e5e1 (fix: bump minimum Boost to 1.78 to fix boost::filesystem
link errors). BOOST_PROCESS_USE_STD_FS only exists since Boost 1.78;
older versions cause undefined reference errors.

Update the required dependencies line in AGENTS.md to reflect the
new minimum version requirement.

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add missing Build System entry for minimum Boost version bump to 1.78
(OpenMS#9042) to fix boost::filesystem link errors.

Co-authored-by: GitHub Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* docs: add design spec for removing Qt from TOPP tools

Covers two-phase approach: replace Qt6::Core usage with std/OpenMS
utilities in 11 tools, then move 3 GUI-entangled tools to openms_gui.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add implementation plan for removing Qt from TOPP tools

13 tasks covering Phase 1 (replace Qt6::Core in 11 tools) and Phase 2
(move 3 GUI tools to openms_gui, clean up CMake).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(Resampler): remove dead GUI include, ungate from WITH_GUI

Remove unused #include <OpenMS/VISUAL/MultiGradient.h> from Resampler.cpp,
move Resampler from TOPP_executables_with_GUIlib to TOPP_executables,
remove it from the ToolHandler GUI_tools exclusion list, and drop the
WITH_GUI guard around its TOPP test definitions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(MzMLSplitter): replace QFile::size() with std::filesystem::file_size()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(QCExtractor,QCShrinker): replace QFileInfo::baseName() with FileHandler::stripExtension()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(QCEmbedder): replace QFile/QByteArray with std::ifstream and Base64

Remove Qt includes (QByteArray, QFile, QString, QFileInfo) and replace with
OpenMS FileHandler/Base64 and std::ifstream/ostringstream for file reading
and base64 encoding. Use FileHandler::stripExtension + File::basename for
compound-extension-aware basename extraction.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(IDRipper): replace QDir/QFileInfo with std::filesystem and File::absolutePath()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(AssayGeneratorMetaboSirius): replace QDirIterator with std::filesystem::directory_iterator

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(OpenSwath*): replace QDir with File::absolutePath()

Replace QDir::absolutePath() and QFileInfo::baseName() usage in
OpenSwathFileSplitter and OpenSwathWorkflow with OpenMS-native
File::absolutePath() and FileHandler::stripExtension(File::basename()),
removing the Qt dependency from both TOPP tools.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(CometAdapter): replace QRegularExpression with std::regex

Remove Qt dependency on QRegularExpression, replacing it with std::regex
and std::remove for space stripping as part of Qt removal from TOPP tools.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(MetaProSIP): replace QProcess/QFile/QDir with ExternalProcess and File utilities

Replace all Qt dependencies in MetaProSIP.cpp:
- Remove QtCore includes (QStringList, QFile, QDir, QFileInfo, QProcess)
- Add ExternalProcess, filesystem includes
- Add runRScript() helper to consolidate 5 near-identical QProcess blocks
- Replace QProcess R execution with ExternalProcess::run() calls
- Replace QFile::copy/remove with File::copy/remove
- Replace QDir with File::absolutePath and std::filesystem::create_directories
- Replace QFileInfo::baseName with FileHandler::stripExtension(File::basename())
- Change QString function parameters to const String&

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* build(topp): remove Qt6::Core linking — all non-GUI TOPP tools are Qt-free

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: move ExecutePipeline, INIUpdater, ImageCreator to openms_gui

Move 3 TOPP tools that depend on OpenMS_GUI into the GUI library
directory, making src/topp/ 100% Qt-free. Update CMake registration
so the tools are built as part of the GUI executables, remain in the
TOPP_TOOLS list for CWL/INI generation, and are dependencies of the
TOPP collection target.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* build: move Qt6 find_package inside WITH_GUI block

Qt is no longer needed by any non-GUI component. Guard KNIME
packaging against missing Qt targets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove commented-out QIODevice includes from QC tools

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(QCEmbedder): use Base64::encodeStrings for raw binary data

Base64::encode is designed for floating-point arrays, not raw bytes.
Use encodeStrings instead which handles arbitrary binary data correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(MzMLSplitter): explicit std::string conversion for filesystem::file_size

OpenMS::String does not implicitly convert to std::filesystem::path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add implementation plan for File API improvements

Adds stemName(), extension(), listDirectories() to OpenMS::File
and replaces awkward patterns in TOPP tools.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(File): add stemName() — basename without known file extension

Adds File::stemName(path) as a convenience wrapper around
FileHandler::stripExtension(File::basename(path)), covering the pattern
that appears 32 times across the codebase.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(File): add extension() — compound-aware file extension extraction

* feat(File): add listDirectories() — sorted list of subdirectories

Adds File::listDirectories() static method that returns a sorted StringList
of subdirectory paths (non-recursive) using std::filesystem, with no-throw
semantics via std::error_code. Includes unit tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(topp): use File::stemName(), listDirectories(), fileSize(), makeDir()

Replace FileHandler::stripExtension(File::basename(...)) with File::stemName(...)
across 17 call sites in 10 TOPP tools. Replace raw std::filesystem calls with
File API methods in AssayGeneratorMetaboSirius, MzMLSplitter, and MetaProSIP.
Remove unnecessary #include <filesystem> and #include <FileHandler.h> where
they are no longer needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: TOPP_TOOLS ordering and MetaProSIP error diagnostics

- Swap add_subdirectory order in src/CMakeLists.txt: topp before
  openms_gui so TOPP_TOOLS cache is initialized before GUI appends.
  Fixes CWL/CTD generation for moved GUI tools.
- Surface ExternalProcess error_msg in MetaProSIP's runRScript helper
  so startup failures (missing R, permissions) are visible in logs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(File): listDirectories uses error_code overloads and returns absolute paths

Use is_directory(ec) and fs::absolute() to honor the documented
no-throw and absolute-path contracts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(IDRipper): use to_path() for UTF-8 safe filesystem path construction

std::filesystem::path(std::string) uses the current code page on Windows,
not UTF-8. Use OpenMS::to_path() from PathUtils.h instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: use File::stemName() in core/GUI, remove spec/plan docs

Replace all 15 remaining FileHandler::stripExtension(File::basename())
call sites in src/openms/ and src/openms_gui/ with File::stemName().
Remove FileHandler.h includes where no longer needed.
Remove superpowers spec and plan documents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add missing entry for OpenMS#9041 (remove Qt dependency from TOPP tools):
- Qt6::Core removed from all non-GUI TOPP tools in Dependencies section
- Qt6 find_package guarded by WITH_GUI in Build System section

Co-authored-by: GitHub Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…penMS#9052)

The recent addition of IMPeakType (split from IMFormat in commit 3f8956f)
was bound in bind_kernel.cpp alongside DriftTimeUnit and IMFormat. The
binding file routing table did not reflect this, leaving developers to
incorrectly infer from the 'Everything else → bind_misc.cpp' catchall
that new IONMOBILITY enums should go in bind_misc.cpp.

Add an explicit row documenting that IONMOBILITY enums (DriftTimeUnit,
IMFormat, IMPeakType) belong in bind_kernel.cpp, and clarify the
'Everything else' row with an example (IMTypes class stays in bind_misc.cpp).

Co-authored-by: GitHub Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat: add option to keep cached files in OpenSwath workflow

* feat: add cache file auto cleanup to OpenSwathWorkflow

* feat: implement per-run cache cleaner for temporary directory management

* feat: enhance per-run cache directory creation with retry logic and error handling

* feat: add custom base directory support for TempDir and remove PerRunCacheCleaner
…penMS#9060)

The Qt refactoring in OpenMS#9041 removed all Qt dependencies from TOPP tools
by replacing QFile, QDir, QFileInfo, QByteArray etc. with std::filesystem,
OpenMS File utilities, and Base64. Qt6 is now only required for the GUI
library (openms_gui / WITH_GUI), not for the command-line TOPP tools.

Co-authored-by: GitHub Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ion) (OpenMS#9063)

* Initial plan

* Remove duplicate test_msspectrum.py and merge content into test_MSSpectrum.py

Agent-Logs-Url: https://github.com/OpenMS/OpenMS/sessions/9f51b4f8-c1c1-4388-9ba5-934f33e708ed

Co-authored-by: cbielow <6008722+cbielow@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: cbielow <6008722+cbielow@users.noreply.github.com>
…MS#9073)

The feat commit cb6c7d1 added Bruker .d directory support to
OpenSwathWorkflow (WITH_OPENTIMS builds) but did not update the
doxygen documentation block. Add a new 'Bruker .d' subsection and
update the overview list item to reflect that .d directories are
now a supported input format alongside mzML/mzXML.

Co-authored-by: GitHub Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…penMS#9068)

* feat(MSstatsConverter): add -remove_shared_peptides flag to control shared peptide filtering

The MSstatsConverter silently drops peptides mapping to proteins in
different indistinguishable protein groups via isQuantifyable_().
This is hardcoded with no user control, causing proteins with only
shared peptides to vanish from MSstats output even when the upstream
ProteinQuantifier was configured with use_shared_peptides=true.

This commit adds a -remove_shared_peptides flag (default true for
backward compatibility) that allows users to retain shared peptides
in the MSstats output. It also adds OPENMS_LOG_WARN messages
reporting how many peptide hits were dropped, so the filtering is
no longer silent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(MSstatsConverter): fix Doxygen warnings and backward-compatible default for remove_shared_peptides

Add full @param documentation for storeLFQ() to match storeISO() style,
fixing Doxygen warning test failures. Change remove_shared_peptides from
registerFlag_ (default false) to registerStringOption_ with default "true"
so shared peptides are removed by default, preserving backward-compatible
behavior and ensuring test CSV outputs remain unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(API): update pyOpenMS bindings and test signatures for remove_shared_peptides

- Add remove_shared_peptides parameter (default true) to storeLFQ() and
  storeISO() nanobind lambdas in bind_format.cpp
- Update MSstatsFile_test.cpp START_SECTION signatures to match the new
  C++ API

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: handle unwritable $HOME in UpdateCheck gracefully

UpdateCheck::run() calls fs::create_directories() with the throwing
overload to create $HOME/.config/OpenMS. In container environments
where $HOME=/ (not writable), this crashes all TOPP tools with:

  terminate called after throwing an instance of
  'std::filesystem::filesystem_error': cannot create directories:
  Permission denied [//.config/OpenMS]

Use the std::error_code overload (consistent with File.cpp lines
241, 316) and skip the update check gracefully when the config
directory cannot be created.

Fixes OpenMS#9074

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: set HOME=/tmp in container and handle unwritable $HOME in UpdateCheck

Two complementary fixes for the crash when $HOME is not writable:

1. dockerfiles/Dockerfile: Set ENV HOME=/tmp in the tools-thirdparty
   stage so containers always have a writable home directory for
   config files (.config/OpenMS/*.ver).

2. src/openms/source/SYSTEM/UpdateCheck.cpp: Use the non-throwing
   std::error_code overload of create_directories (consistent with
   File.cpp) and skip the update check gracefully when the config
   directory cannot be created. This provides defense-in-depth for
   environments where $HOME is overridden to a non-writable path.

Fixes OpenMS#9074

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use /home/openms as writable HOME in container

Follow the pattern from quantms-rescoring: create a dedicated writable
directory and set HOME to it, rather than using /tmp. This ensures
.config/OpenMS and other home-relative paths work correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use HOME=/app consistent with quantms-rescoring convention

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: move HOME=/app to tools stage so it propagates to all child images

Julianus noted the HOME setting should be in the tools stage (not just
tools-thirdparty) since the tools image is also published separately.
Docker ENV directives propagate through FROM inheritance, so setting it
in tools covers tools, tools-thirdparty, and test stages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fixed the broken Hash in the config File

* Update src/openms/extern/tool_description_lib/tdl-config.cmake

* Update src/openms/extern/tool_description_lib/tdl-config.cmake

* CMakeLists und executables fixed.

* Revert "CMakeLists und executables fixed."

This reverts commit 865eff1.

---------

Co-authored-by: Chris Bielow <chris.bielow@fu-berlin.de>
If a PR adds a TOPP tool, this check will post a PR comment if that
tool isn't listed in the `ToolHandler.cpp` file.

Co-authored-by: Timo Sachsenberg <timo.sachsenberg@uni-tuebingen.de>
This change adds a script that can check if the authors of git commits
are in the AUTHORS file.  The script is automatically run during the
PR process and if an author is missing then a comment is added to
remind the PR author.

Co-authored-by: Timo Sachsenberg <timo.sachsenberg@uni-tuebingen.de>
* fix: make enzyme configurable in MRMFeatureFinderScoring (OpenMS#9072)

Replace hardcoded "Trypsin" enzyme in scorePeakgroups() with a
user-configurable parameter. The enzyme is used for missed cleavage
counting and defaults to "Trypsin" for backward compatibility. Valid
values are populated from ProteaseDB.

Since MRMFeatureFinderScoring parameters are already exposed in
OpenSwathWorkflow via the Scoring: subsection, no TOPP tool changes
are needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use ListUtils to convert enzyme names for setValidStrings

std::vector<OpenMS::String> cannot be implicitly converted to
std::vector<std::string> required by Param::setValidStrings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add two missing changelog entries for the OpenMS 3.6.0 under-development section:

- General: Fix UpdateCheck crash when $HOME is not writable in containers (OpenMS#9075)
- TOPP tools / MSstatsConverter: new -remove_shared_peptides parameter (OpenMS#9068)

Co-authored-by: GitHub Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Timo Sachsenberg <timo.sachsenberg@uni-tuebingen.de>
* github action: welcome new contributors

welcome new contributors and provide link to tally survey

* apply codderrabbit suggestions
…st (OpenMS#9065) (OpenMS#9081)

* fix: remove arrow/csv dependency from Arrow_test (OpenMS#9065)

Arrow CSV support (ARROW_CSV) is OFF by default in upstream Apache Arrow.
OpenMS never uses arrow/csv in its library code — the CSV portions of
this test were copy-pasted from the Arrow tutorial and only tested
Arrow itself, not OpenMS functionality.

Remove CSV read/write from the test, keeping IPC and Parquet coverage
which are the formats OpenMS actually uses. This fixes compilation
for users whose Arrow installation lacks CSV support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use NEW_TMP_FILE for temp paths in Arrow_test

Address CodeRabbit review: replace hardcoded filenames with
NEW_TMP_FILE macro to avoid collisions in parallel test runs
and stale artifacts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: remove Arrow IPC from test, keep only Parquet

Arrow IPC is not used anywhere in the OpenMS library — only Parquet is.
Simplify the test to a single Parquet roundtrip (write, read, write).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- MRMFeatureFinderScoring: add enzyme parameter entry for OpenSwathWorkflow (OpenMS#9077)
- yaml-cpp 0.9.0 dependency bump fixing YAML hash parsing (OpenMS#9056)

Co-authored-by: GitHub Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rator (OpenMS#9078) (OpenMS#9084)

The neutral-loss loop for prefix ions (b/a) reset the running mass
accumulator but did not re-add peptide[0]'s residue mass, unlike the
plain-ion loop above it. This caused all b/a neutral-loss fragments
to be too low by exactly the first residue's internal mass.

Add the same preamble (residue 0 mass + loss formulas) to the
neutral-loss loop, mirroring the plain-ion loop structure.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add design spec for float IntensityType propagation (OpenMS#8872)

Design for propagating float IntensityType through ~30 algorithm and I/O
locations. Five phased PRs: template openswathalgo, cleanup casts,
algorithm internals, quantitation/metadata, I/O buffers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update float propagation spec with Codex review findings

Address critical issues from Codex review:
- Add computeAndAppendRank/computeRankVector to PR 0 template list
- Add DataValue::DOUBLE_LIST boundary for OpenSwath_Ind_Scores
- Add missing header files to PR 2 file list (MRMScoring.h, etc.)
- Add MRMFeature.cpp to PR 2 for float->DoubleList conversion
- Fix line number references (QcMLFile, LinearResampler, OpenSwathScoring)
- Add deferred mobilogram/XIM paths to out-of-scope

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: address code review findings in float propagation spec

Critical fixes:
- Add IFeature::getIntensity(vector<float>&) overload to PR 0 scope
  to unblock PR 2's float intensity matrices in MRMScoring
- Audit OpenSwath_Ind_Scores fields: only 7 of 40 are genuine intensity
  values (change to float); rest are scores/coordinates (stay double)

Important fixes:
- Keep MSChromatogramParquetConsumer buffer as double (encodeNP float
  overload adds overhead via temporary double copy)
- Document GaussFilterAlgorithm::integrate_() double return requiring
  explicit narrowing cast
- Reframe SIMD claims: primary win is memory reduction, Eigen float
  paths are secondary benefit, push_back loops won't vectorize
- Document normalized_library_intensity stays double, scoring templates
  instantiated for both float and double parameter types
- Add DIAPrescoring.cpp change description to PR 2 table
- Add conversion map showing all float/double boundaries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: apply third-pass review fixes to float propagation spec

- Move ind_im_log_intensity from float to double category (log-transformed
  score, not raw intensity) — 6 of 40 fields now change to float
- Add fillIntensityFromPrecursorFeature and initializeXCorrMatrix float
  overload to PR 2 scope (both missed in prior reviews)
- Expand cascading changes section with concrete dependency chains
  showing how DIAHelper -> DIAPrescoring -> normalize flows work
- Clarify conversion map notation (concrete classes, not templates)
- Add PR 0 scope rationale note (split option documented)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: consolidate 10-agent review findings into float propagation spec

Critical fixes:
- Narrow Ind_Scores float fields from 6 to 3 (only raw intensity:
  ind_area_intensity, ind_total_area_intensity, ind_apex_intensity).
  Ratios, log-transforms, and scores stay double.
- Remove QcMLFile.cpp from PR 1 (String(float) changes output format)
- Fix MRMFeatureFinderScoring.cpp line refs (457-458 are scores, not
  intensity — corrected to 390-398)
- Add IsobaricWorkflow.cpp to PR 3 (main extractSingleSpec caller)
- Remove CachedMzMLHandler from PR 4 (already handles conversions)
- Add MRMTransitionGroupPicker.h:811-831, PeakPickerChromatogram.h:139,
  MRMScoring.cpp:544 to PR 2 scope

Warning fixes:
- Add norm/manhattanDist float instantiation deps to PR 0
- Fix SpectralAngle Eigen note
- Add GaussFilterAlgorithm.h to PR 2 file list
- Add IonMobilityScoring mixed-type vector risk
- Add DIAPrescoring.cpp mixed-type iterator linker risk
- Add ITransitionGroup/LightTransition to boundaries table
- Add TraML/TSV precision and pyOpenMS behavioral risks
- Expand Out of Scope with 20+ deferred locations from
  comprehensive codebase scan

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: resolve all open decisions in float propagation spec

Decisions resolved:
- MRMScoring API: template on scalar type (not overloaded)
- calcLibraryScore: experimental_intensity stays double (paired with
  library_intensity in same-type scoring calls)
- DIAPrescoring: intTheor also changes to vector<float> (avoids
  mixed-type iterator instantiation)
- IonMobilityScoring: ms1_int_values and all intensity vectors change
  to float (source data is MobilityPeak1D::IntensityType = float)
- IFeature vtable: acknowledged as acceptable (no stable ABI contract)
- IsobaricIsotopeCorrector: stays double, removed from PR 3 scope
- MRMScoring MI functions: all 5 initializeMI* explicitly covered
- Deprecated double[] Scoring overloads: stay double-only

Consistency fixes:
- MRMTransitionGroupPicker PR 1: corrected line refs (426,439)
- Removed resolved mixed-type risks from risk table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add PR 0 implementation plan for float templating

9-task plan covering:
- Tasks 1-4: Template 11 Scoring functions (normalize_sum, cross-corr,
  standardize_data, NormalizedManhattanDist, RMSD, SpectralAngle,
  rank helpers)
- Task 5: Template 3 StatsHelpers functions + float instantiations
  for norm/manhattanDist
- Tasks 6-7: Float test cases for Scoring and DiaHelpers
- Task 8: IFeature::getIntensity(vector<float>&) overload
- Task 9: Integration verification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* build: disable TDL (CWL support) by default

TDL pulls in yaml-cpp as a transitive dependency and is only needed
for generating Common Workflow Language tool descriptions. Most users
don't need CWL export. Disable by default to reduce build dependencies.

Enable with -DENABLE_TDL=ON when CWL support is needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: suppress unused parameter warnings when TDL is disabled

Cast parameters to void in the #else branch to avoid MSVC /we4100
errors (unused parameter treated as error) when ENABLE_TDL is off.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove plan and spec documents from PR

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TheoreticalSpectrumGenerator: fix m/z for b/a neutral-loss ions (OpenMS#9078, OpenMS#9084)
- Build: ENABLE_TDL (CWL support) now defaults to OFF (OpenMS#9067)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Initial plan

* Fix locale-dependent sorting in AUTHORS check workflow

Set LC_ALL=C in the check-authors-file workflow to ensure the sort
command in ci-tools/scripts/authors.sh uses byte-order sorting
instead of locale-dependent collation. This prevents names with
periods (like "Michael R. Crusoe") or prefix-matching names
(like "Marc"/"Marcel") from being sorted incorrectly depending
on the CI environment's locale settings.

Also re-sorts the AUTHORS file to match LC_ALL=C sort -u ordering:
- "Johan Teleman" before "Johannes" (space < 'n' in byte order)
- "Julia Thueringer" before "Juliane" (space < 'n' in byte order)
- "Marc Sturm" before "Marcel Schilling" (space < 'e' in byte order)

Agent-Logs-Url: https://github.com/OpenMS/OpenMS/sessions/b027b023-82a4-4eae-b131-714c4c5b1722

Co-authored-by: timosachsenberg <5803621+timosachsenberg@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: timosachsenberg <5803621+timosachsenberg@users.noreply.github.com>
…penMS#9088)

* feat: refactor FragmentIndex to SoA layout with SIMD-accelerated query

Replace the Array-of-Structs vector<Fragment> with Structure-of-Arrays
(separate float[] for m/z and uint32_t[] for peptide indices). This
enables SIMD vectorization of the query hot loop.

The query inner loop now uses SSE2 via SIMDe to compare 4 fragment m/z
values against the tolerance window per iteration, giving ~3-4x speedup
on the most performance-critical path in database search.

Changes:
- FragmentIndex.h: replace fi_fragments_ with fi_fragment_mzs_ and
  fi_fragment_peptide_idxs_ parallel vectors
- FragmentIndex.cpp: permutation-based sorting for SoA, SIMD query loop
  with scalar remainder, updated clear()/print_slice()
- FragmentIndex_test.cpp: update fragmentsSorted() for SoA access

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add scalar query fallback and SIMD vs scalar benchmark

Add queryScalar() method that uses the same SoA layout but without
SIMD intrinsics, for direct performance comparison. Add benchmark
test section that runs both paths 1000 iterations and reports speedup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: treat peptide_idx_range.second as exclusive, fix benchmark params

getPeptidesInPrecursorRange() returns a half-open range [first, second)
via upper_bound, but query() treated second as inclusive. Fix both SIMD
and scalar paths to use >= for the exclusive upper bound check.

Also fix benchmark test parameter names and types to match actual
FragmentIndex defaults, and register FragmentIndex_test in executables.cmake.

Benchmark result: ~2.9x speedup (SIMD vs scalar) on small test case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: SIMD loop double-hit bug and add comprehensive edge case tests

Fix: after the SIMD loop's pidx early-exit break, advance i past the
processed group (i += 4) so the scalar remainder doesn't re-scan the
same elements. This caused duplicate hits whenever the last element
in a SIMD group exceeded the peptide_idx range.

Add 9 new test sections verifying SIMD vs scalar equivalence across:
- Multiple peptides and charge states (1-3)
- Real FASTA sequences (from SSE/Comet test data)
- Empty precursor range [k, k)
- Very small index (< 4 fragments, scalar-only path)
- PPM tolerance mode
- Tolerance boundary (peaks just inside/outside tolerance)
- Multiple fragment charges
- Wide precursor window (many candidates)
- SoA ordering invariants

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review findings — reserve placement, hit content verification

- Move fragment reserve() after generatePeptides() so fi_peptides_.size()
  is nonzero and the preallocation actually works (CodeRabbit finding A)
- Add SoA size consistency guard in fragmentsSorted() (CodeRabbit B)
- Strengthen simdScalarMatch() to compare actual hit content (peptide_idx
  + fragment_mz) element-wise, not just hit count (CodeRabbit D)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: remove SIMD query, keep SoA layout and edge case tests

Benchmarking showed the SIMD query loop provides negligible speedup
on real workloads because the build phase (digestion, sorting, OMP
critical sections) dominates total runtime (~95%). Remove the SIMD
code complexity while keeping the beneficial SoA layout, exclusive
range fix, and comprehensive edge case tests.

Removed: queryScalar(), SIMDe include, SIMD vs scalar benchmark tests
Kept: SoA layout, permutation sort, exclusive range semantics, 6 edge
case tests (empty range, small bucket, ppm tolerance, tolerance
boundary, wide precursor window, SoA invariants)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: eliminate omp critical in fragment generation via per-thread vectors

Replace the serialized omp critical section (55M mutex acquisitions for
fragment push_back) with per-thread vector pairs that accumulate
fragments lock-free, then merge into the global SoA arrays after the
parallel region.

Benchmark on HeLa DDA TimsTOF + Human SwissProt (4 threads):
- Wall time: 3m40s → 3m14s (1.13x faster)
- CPU time: 5m02s → 3m48s (1.32x less contention)
- Results identical (same PSM counts at 1% FDR)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: eliminate omp critical in fragment generation via per-thread vectors

Revert SoA layout (was a net regression: 2x slower, 1.8x more memory
due to permutation sort overhead). Keep the original AoS Fragment
struct and apply per-thread vector accumulation to eliminate the omp
critical section that serialized ~55M push_back calls.

Each thread accumulates fragments into its own vector, then vectors
are merged sequentially after the parallel region. This avoids mutex
contention while keeping AoS in-place sort efficiency.

Benchmark on DDA TimsTOF + Human SwissProt (4 threads):
  develop (omp critical): 1m17s wall, 2m30s CPU, 3326 MB
  this commit:              57s wall, 1m22s CPU, 3326 MB
  → 1.35x wall speedup, 1.83x CPU reduction, same memory

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: residue lookup table for lightweight fragment/precursor generation

Replace AASequence::fromString + TheoreticalSpectrumGenerator with a
static residue mass lookup table (char -> internal monoisotopic mass)
for direct fragment m/z computation from protein sequence strings.

Changes:
- Add residue_mass_table_[128] populated from ResidueDB at first use
- Add generateFragmentsLightweight_() computing b/y ions via cumulative
  sum over the lookup table (no AASequence parsing, no TSG overhead)
- Unmodified peptide path in generatePeptides() computes precursor m/z
  directly from lookup table (eliminates AASequence::fromString entirely)
- Modified path extracts per-residue mass deltas from AASequence after
  modification application, then uses lightweight generator
- Per-thread vectors in generatePeptides() (eliminates second omp critical)

Build phase timing (20 threads, Human SwissProt):
  generatePeptides: 5.3s -> 2.1s (2.5x faster)
  generateFragments: kept at 2.3s (already fast from per-thread vectors)

Overall benchmark (DDA TimsTOF + Human SwissProt, 20 threads):
  develop:    1m17s wall, 2m30s CPU
  this branch: 1m07s wall, 2m19s CPU (same results, same memory)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: correct fragment loop bounds and residue mass table for lightweight generation

Stop prefix (b/a/c) and suffix (y/x/z) ion loops one position early to match
TheoreticalSpectrumGenerator default behavior — avoids emitting full-length
ions (b_n/y_n) that are effectively precursor masses.

Populate residue mass table from all ResidueDB single-letter codes (A–Z)
instead of hardcoding the 20 canonical AAs, so selenocysteine (U) and
pyrrolysine (O) get correct masses. Also skip peptides with ambiguous
codes (B/Z/J) alongside X.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: default to all cores in search engines (threads=0 support)

PeptideDataBaseSearchFI and SimpleSearchEngine now use all available
cores when -threads is not explicitly set (matching Sage's default).
TOPPBase::setMaxNumberOfThreads treats 0 as all cores via
omp_get_num_procs(). Global default stays at 1 for other tools.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Revert "feat: default to all cores in search engines (threads=0 support)"

This reverts commit 1163082.

* Reapply "feat: default to all cores in search engines (threads=0 support)"

This reverts commit f48a7a3.

* fix: update test reference files for threads description change

Update all INI/toppas reference files and TOPPBase_test.cpp to match
the new threads parameter description "(0 = all available cores)".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review findings for fragment index fast-build

- Guard #include <omp.h> with #ifdef _OPENMP and add fallbacks for
  non-OpenMP builds (num_threads=1, tid=0)
- Replace static bool init guard with std::call_once for thread-safe
  residue mass table initialization
- Remove search engine thread overrides that hijacked -threads 1
  to mean all cores (threads=0 support remains at TOPPBase level)
- Remove J from ambiguous AA filter (ResidueDB defines J/Xle with
  valid mass)
- Replace std::cout with OPENMS_LOG_INFO for progress messages
- Remove redundant initResidueMassTable_() call from build()
- Rename file-scope static computePrecursorMzFromChars_ to drop
  trailing underscore (not a member function)
- Add lightweight_fragment_count test validating 2*(n-1) fragments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: bitmask-based modification enumeration bypassing AASequence

Replace AASequence::fromString + ModifiedPeptideGenerator with direct
bitmask enumeration of variable modification combinations:

- Add per-AA modification lookup tables (fixed_mod_deltas_,
  variable_mod_table_) built once from the modification config
- Scan peptide sequences to find variable mod slots (position,
  delta_mass, mod_ptr) — deterministic left-to-right ordering
- Enumerate valid bitmask subsets with conflict detection for
  multi-mod-per-site (mutually exclusive bits for same position)
- Compute precursor m/z as base_mass + sum(selected deltas)
- Reconstruct per-residue delta arrays from bitmask in build()
  for fragment generation (no AASequence needed)
- Add reconstructModifiedSequence() for output-time AASequence
  reconstruction (only called for final ~1000 hits, not millions)
- Replace modification_idx_ (UInt32) with mod_bitmask_ (uint32_t)
  supporting up to 32 variable mod slots per peptide
- Simplify PeptideSearchEngineFIAlgorithm hit processing from 10
  lines of AASequence reconstruction to single method call
- Add edge case tests: multi-mod-per-site, fixed+variable mods,
  fragment count with modifications, AASequence reconstruction

Eliminates all AASequence construction from the build hot path.
Expected ~20-50x speedup for the modification branch of build().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: replace __builtin_popcount with std::popcount for MSVC compat

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: cross-validate bitmask enumeration against ModifiedPeptideGenerator

Add comprehensive test comparing FragmentIndex bitmask-based modification
enumeration against ModifiedPeptideGenerator for correctness:
- Simple fixed + variable mods (Carbamidomethyl + Oxidation)
- Multiple modifiable sites with combinatorial enumeration
- N-terminal variable mod coexisting with residue mod
- Two different variable mods targeting the same amino acid
- No modifiable sites (fixed mods only, no matching residues)
- Mixed site types with max_variable_mods=3

Each case validates variant count, precursor masses, and reconstructed
AASequence strings match the original ModifiedPeptideGenerator output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: handle PROTEIN_N_TERM/C_TERM mods based on protein position

PROTEIN_N_TERM mods (e.g., Acetyl (Protein N-term)) should only apply
to peptides starting at protein position 0, not to every peptide.
Similarly, PROTEIN_C_TERM mods only apply at the protein C-terminus.

Pass is_protein_nterm/is_protein_cterm flags to buildModSlots_ based
on the peptide's position within the protein sequence. This correctly
handles these modifications where ModifiedPeptideGenerator silently
skipped them entirely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add FDR filtering tests for SSE and FI DDA benchmarks

Add TOPP_FalseDiscoveryRate_SimpleSearchEngineDDA and
TOPP_FalseDiscoveryRate_PeptideDataBaseSearchFIDDA tests
that apply 1% PSM FDR filtering after the DDA search,
matching the existing TOPP_FalseDiscoveryRate_SageDDA test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: doxygen documentation for new FragmentIndex types

Use @brief style for VarModEntry and ModSlot structs, add
documentation for static constexpr sentinel values, replace
unicode em-dash with ASCII in reconstructModifiedSequence comment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: parallel fragment sort via Boost.Sort block_indirect_sort

Replace single-threaded std::sort with boost::sort::block_indirect_sort
for the global fragment m/z sort. This uses std::thread internally (no
TBB dependency) and provides ~4.6x speedup on the sort phase (45.6s
down to 10.0s on 200M fragments).

Also add default constructor to Fragment struct (required for parallel
sort temporary buffers).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: set Sage min_ion_index=0 in DDA benchmark for fair comparison

Sage's default min_ion_index=2 skips b1/b2/y1/y2 ions in preliminary
scoring while OpenMS engines match all ions. Set to 0 for an
apples-to-apples comparison in the DDA integration test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add fragment:min_ion_index parameter to FragmentIndex

Skip the first N ions from each series (b/y/a/c/x/z) during fragment
generation. Default 0 (include all ions). Setting to 2 skips b1/b2
and y1/y2, matching Sage's default behavior. This allows direct
comparison and can reduce noise from unreliable low-index ions.

Includes test validating fragment count with min_ion_index=2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…1/y2) (OpenMS#9094)

* feat: default fragment:min_ion_index=2 (skip b1/b2/y1/y2)

Match Sage's default behavior of excluding the four shortest fragment
ions (b1, b2, y1, y2) from the fragment index. These low-index ions
are often noisy and unreliable, and skipping them:
- Improves PSM count at 1% FDR (DDA HeLa: 4339 → 4374, +35)
- Reduces wall time (59s → 48s, -11s)
- Reduces memory usage (4283 MB → 3984 MB, -300 MB)

Also expose fragment:min_ion_index parameter through
PeptideSearchEngineFIAlgorithm so it can be set by TOPP tools.

Update tests that depend on full ion counts to set min_ion_index=0
explicitly (lightweight_fragment_count, multi_mod_per_site,
fixed_plus_variable_mods, querySpectrum, tolerance).

Update Sage DDA test to use -min_ion_index 2 (matching new FI default).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: clarify fragment:min_ion_index help text and inline comment

The previous wording "Minimum ion index to consider" and "Ions below
this index" was inconsistent with the actual implementation, which
skips ions with index ≤ min_ion_index_ (note: less-than-OR-EQUAL, not
strictly below). With min_ion_index=2, b1 and b2 are both skipped
(min ion considered is actually 3, not 2).

Update help string in both FragmentIndex.cpp and
PeptideSearchEngineFIAlgorithm.cpp, plus the inline comment that
incorrectly said "<" instead of "<=".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: timosachsenberg <sachsenb@ibminode06.Cs.Uni-Tuebingen.De>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* doc: Repair the AUTHORS file

Somehow the file got goofed up since my last PR.

* ci/authors: Don't set LC_ALL

The `authors` script will take care of this now.
Comment thread src/openms/source/CHEMISTRY/Residue.cpp Outdated
{
amino_acid = one_letter_code_[0];
}
else {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding Convention

Comment thread src/openms/source/CHEMISTRY/Residue.cpp Outdated
const int scale_idx = static_cast<int>(scale);
if (scale_idx < 0 || scale_idx >= 7)
{
throw Exception::InvalidValue(__FILE__, __LINE__, OPENMS_PRETTY_FUNCTION, "Unknown hydrophobicity scale", "");

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the exception, the empty String part is kind of irritating. Replacement with the scale, since it caused the exception. Or - if needed - cast to string?

const double result = scales[scale_idx][amino_acid - 'A'];
if (result == 999)
{
throw Exception::InvalidValue(__FILE__, __LINE__, OPENMS_PRETTY_FUNCTION, "No hydrophobicity value known for this residue", one_letter_code_);

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above. One letter Code caused the exception here


START_SECTION(double computeGRAVY(const AASequence& seq))
{
AASequence seq("ACDE");

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not SIXSEVEN?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SICKSEVEN

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Chris ;D

* fix(cmake): accept Arrow/Parquet 24.x and later

Ubuntu 24.04 GitHub-hosted runners shipped Arrow 24.0.0 as of 2026-04-21,
breaking all Linux CI jobs with:

  CMake Error: Could not find a configuration file for package "Arrow"
  that is compatible with requested version "23".
    ArrowConfig.cmake, version: 24.0.0

Root cause: Arrow's ConfigVersion file uses SameMajorVersion
compatibility, so `find_package(Arrow 23 CONFIG REQUIRED)` refuses 24.x
even when 24.x would work for our usage. Drop the version from
find_package and enforce the >= 23 minimum via an explicit
Arrow_VERSION / Parquet_VERSION VERSION_LESS check. This accepts any
major >= 23 without guessing which future majors will be API-compatible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(pyOpenMS): exclude pyarrow 24.0.0 (cp310-only wheels on PyPI)

pyarrow 24.0.0 on PyPI only shipped cp310 wheels with no source
distribution, so `uv sync` fails on runners using cp311/cp312/cp313:

  error: Distribution `pyarrow==24.0.0` can't be installed because it
  doesn't have a source distribution or wheel for the current platform
  hint: You're using CPython 3.12 (`cp312`), but `pyarrow` (v24.0.0)
  only has wheels with the following Python ABI tag: `cp310`

Before this commit, pyarrow was listed unpinned in the arrow/all/test
optional dependency groups and in cibuildwheel test-requires, so uv
picked up 24.0.0 and refused to install, which in turn left the
bootstrapped venv without numpy and failed the subsequent
`find_package(Python ... NumPy REQUIRED)` call.

Use `pyarrow!=24.0.0` rather than `<24.0.0` so that a future 24.0.1 (or
later) release with proper wheel coverage is picked up automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/openms/source/CHEMISTRY/Residue.cpp Outdated
const int scale_idx = static_cast<int>(scale);
if (scale_idx < 0 || scale_idx >= 7)
{
throw Exception::InvalidValue(__FILE__, __LINE__, OPENMS_PRETTY_FUNCTION, "Unknown hydrophobicity scale", "");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw Exception::InvalidValue(__FILE__, __LINE__, OPENMS_PRETTY_FUNCTION, "Unknown hydrophobicity scale", "");
throw Exception::InvalidValue(__FILE__, __LINE__, OPENMS_PRETTY_FUNCTION, "Unknown hydrophobicity scale", String(scale_idx));

Comment on lines +40 to +42
Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol. 1984 Oct 15;179(1):125-42. doi: 10.1016/0022-2836(84)90309-7. PMID: 6502707.

Eisenberg D, Weiss RM, Terwilliger TC. The hydrophobic moment detects periodicity in protein hydrophobicity. Proc Natl Acad Sci U S A. 1984 Jan;81(1):140-4. doi: 10.1073/pnas.81.1.140. PMID: 6582470; PMCID: PMC344626.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol. 1984 Oct 15;179(1):125-42. doi: 10.1016/0022-2836(84)90309-7. PMID: 6502707.
Eisenberg D, Weiss RM, Terwilliger TC. The hydrophobic moment detects periodicity in protein hydrophobicity. Proc Natl Acad Sci U S A. 1984 Jan;81(1):140-4. doi: 10.1073/pnas.81.1.140. PMID: 6582470; PMCID: PMC344626.
- Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol. 1984 Oct 15;179(1):125-42. doi: 10.1016/0022-2836(84)90309-7. PMID: 6502707.
- Eisenberg D, Weiss RM, Terwilliger TC. The hydrophobic moment detects periodicity in protein hydrophobicity. Proc Natl Acad Sci U S A. 1984 Jan;81(1):140-4. doi: 10.1073/pnas.81.1.140. PMID: 6582470; PMCID: PMC344626.

Comment on lines +26 to +30
Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982 May 5;157(1):105-32. doi: 10.1016/0022-2836(82)90515-0. PMID: 7108955.

Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol. 1984 Oct 15;179(1):125-42. doi: 10.1016/0022-2836(84)90309-7. PMID: 6502707.

T.P. Hopp, & K.R. Woods, Prediction of protein antigenic determinants from amino acid sequences., Proc. Natl. Acad. Sci. U.S.A. 78 (6) 3824-3828, https://doi.org/10.1073/pnas.78.6.3824 (1981).

@cbielow cbielow Apr 21, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982 May 5;157(1):105-32. doi: 10.1016/0022-2836(82)90515-0. PMID: 7108955.
Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol. 1984 Oct 15;179(1):125-42. doi: 10.1016/0022-2836(84)90309-7. PMID: 6502707.
T.P. Hopp, & K.R. Woods, Prediction of protein antigenic determinants from amino acid sequences., Proc. Natl. Acad. Sci. U.S.A. 78 (6) 3824-3828, https://doi.org/10.1073/pnas.78.6.3824 (1981).
- Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982 May 5;157(1):105-32. doi: 10.1016/0022-2836(82)90515-0. PMID: 7108955.
- Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol. 1984 Oct 15;179(1):125-42. doi: 10.1016/0022-2836(84)90309-7. PMID: 6502707.
- T.P. Hopp, & K.R. Woods, Prediction of protein antigenic determinants from amino acid sequences., Proc. Natl. Acad. Sci. U.S.A. 78 (6) 3824-3828, https://doi.org/10.1073/pnas.78.6.3824 (1981).

Comment on lines +32 to +36
Henry B. Bull, Keith Breese, Surface tension of amino acid solutions: A hydrophobicity scale of the amino acid residues, Archives of Biochemistry and Biophysics, Volume 161, Issue 2, 1974, Pages 665-670, ISSN 0003-9861, https://doi.org/10.1016/0003-9861(74)90352-X. (https://www.sciencedirect.com/science/article/pii/000398617490352X)

Black SD, Mould DR. Development of hydrophobicity parameters to analyze proteins which bear post- or cotranslational modifications. Anal Biochem. 1991 Feb 15;193(1):72-82. doi: 10.1016/0003-2697(91)90045-u. PMID: 2042744.

Guy, H. R. (1985). Amino acid side-chain partition energies and distribution of residues in soluble proteins. Biophysical journal, 47(1), 61-70.

@cbielow cbielow Apr 21, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Henry B. Bull, Keith Breese, Surface tension of amino acid solutions: A hydrophobicity scale of the amino acid residues, Archives of Biochemistry and Biophysics, Volume 161, Issue 2, 1974, Pages 665-670, ISSN 0003-9861, https://doi.org/10.1016/0003-9861(74)90352-X. (https://www.sciencedirect.com/science/article/pii/000398617490352X)
Black SD, Mould DR. Development of hydrophobicity parameters to analyze proteins which bear post- or cotranslational modifications. Anal Biochem. 1991 Feb 15;193(1):72-82. doi: 10.1016/0003-2697(91)90045-u. PMID: 2042744.
Guy, H. R. (1985). Amino acid side-chain partition energies and distribution of residues in soluble proteins. Biophysical journal, 47(1), 61-70.
- Henry B. Bull, Keith Breese, Surface tension of amino acid solutions: A hydrophobicity scale of the amino acid residues, Archives of Biochemistry and Biophysics, Volume 161, Issue 2, 1974, Pages 665-670, ISSN 0003-9861, https://doi.org/10.1016/0003-9861(74)90352-X. (https://www.sciencedirect.com/science/article/pii/000398617490352X)
- Black SD, Mould DR. Development of hydrophobicity parameters to analyze proteins which bear post- or cotranslational modifications. Anal Biochem. 1991 Feb 15;193(1):72-82. doi: 10.1016/0003-2697(91)90045-u. PMID: 2042744.
- Guy, H. R. (1985). Amino acid side-chain partition energies and distribution of residues in soluble proteins. Biophysical journal, 47(1), 61-70.

/// @warning When the window size is larger than the sequence the window size will be clamped to the sequence length
std::vector<double> computeWindowedProfile(
const AASequence& seq,
Size window_size = 7,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Size window_size = 7,
const Size window_size = 7,

Comment on lines +441 to +447
/// @brief returns the hydrophobicity value of the residue

/// The sources for the hydrophobicity scales are here: @ref HydrophobicityProfile

/// @param scale which scale to use for the hydrophobicity value
/// @return hydrophobicity value of the residue
/// @throw Exception::InvalidValue Throws an exception if the residue is not one of the 20 common amino acids or when an unknown scale is used

@cbielow cbielow Apr 21, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @brief returns the hydrophobicity value of the residue
/// The sources for the hydrophobicity scales are here: @ref HydrophobicityProfile
/// @param scale which scale to use for the hydrophobicity value
/// @return hydrophobicity value of the residue
/// @throw Exception::InvalidValue Throws an exception if the residue is not one of the 20 common amino acids or when an unknown scale is used
/**
@brief returns the hydrophobicity value of the residue
The sources for the hydrophobicity scales are here: @ref HydrophobicityProfile
@param scale which scale to use for the hydrophobicity value
@return hydrophobicity value of the residue
@throw Exception::InvalidValue Throws an exception if the residue is not one of the 20 common amino acids or when an unknown scale is used
*/

Comment on lines +42 to +46
/// @brief Enum for different hydrophobicity scales

/// If a new scale is introduced, append it to the list below and assign it an enum value equal to the current maximum enum value plus one.

/// Add the data for this scale here: @ref Residue::getHydrophobicity

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// @brief Enum for different hydrophobicity scales
/// If a new scale is introduced, append it to the list below and assign it an enum value equal to the current maximum enum value plus one.
/// Add the data for this scale here: @ref Residue::getHydrophobicity
/**
@brief Enum for different hydrophobicity scales
If a new scale is introduced, append it to the list below and assign it an enum value equal to the current maximum enum value plus one.
Add the data for this scale here: @ref Residue::getHydrophobicity
*/

std::vector<double> HydrophobicityProfile::computeWindowedProfile
(
const AASequence& seq,
Size window_size,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Size window_size,
const Size window_size,

Comment on lines +65 to +70
if (window_size > seq.size())
{
OPENMS_LOG_WARN << "Warning: window size (" << window_size << ") is larger than sequence length. Window size clamped to sequence length: " << seq.size() << "\n";
}
std::vector<double> profile;
Size effective_window = std::min(window_size, seq.size()); // size of the window

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (window_size > seq.size())
{
OPENMS_LOG_WARN << "Warning: window size (" << window_size << ") is larger than sequence length. Window size clamped to sequence length: " << seq.size() << "\n";
}
std::vector<double> profile;
Size effective_window = std::min(window_size, seq.size()); // size of the window
const Size effective_window = [&](){
if (window_size > seq.size())
{
OPENMS_LOG_WARN << "Warning: window size (" << window_size << ") is larger than sequence length. Window size clamped to sequence length: " << seq.size() << "\n";
return seq.size();
}
return window_size;
};
std::vector<double> profile;

…OpenMS#9195)

Two related ProSE bugs around Percolator rescoring:

1. FDR:PSM was applied inside the algorithm on raw HyperScore q-values
   before Percolator ran, which also stripped decoys via the
   fdr_protein_==0 branch (protein FDR is already deferred). Percolator
   then saw a target-only input and aborted with "No decoys found".
   Fix: defer FDR:PSM alongside FDR:protein when -percolator_executable
   is set. Post-rescoring, apply PSM FDR on Percolator q-values
   directly (scores are already q-values via -score_type q-value).
   Files that fell back to HyperScores (Percolator skipped/failed)
   compute q-values via FalseDiscoveryRate first. Decoys are only
   removed when protein FDR is disabled, matching the algorithm's
   existing semantics.

2. -out_pin emitted a .pin header missing the mandatory SpecId / Label
   / ScanNr columns and the standard mass/charge/enzyme features
   (ExpMass, CalcMass, mass, peplen, charge{N}, enzN, enzC, enzInt,
   dm, absdm) that PercolatorInfile::preparePin_ already sets on every
   hit. The output was not consumable by the percolator CLI.
   Fix: factor the canonical column list into a new
   PercolatorInfile::getStandardFeatureSet() helper. Both
   PercolatorAdapter (previously hardcoded) and ProSE -out_pin now
   build their feature_set from this helper, so the two tools emit
   consistent, standards-compliant .pin files.

Verified: all 18 ProSE tests, PercolatorInfile_test, and
PercolatorAdapter INI/CTD tests pass. Smoke-tested ProSE -out_pin on
SimpleSearchEngine_1.mzML: header now begins with SpecId / Label /
ScanNr and ends with Peptide / Proteins.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines +92 to +93
Size window_size,
double angle

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Size window_size,
double angle
const Size window_size,
const double angle

Comment on lines +126 to +127
sum_sin = std::pow(sum_sin,2);
sum_cos = std::pow(sum_cos,2);

@cbielow cbielow Apr 21, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sum_sin = std::pow(sum_sin,2);
sum_cos = std::pow(sum_cos,2);
sum_sin *= sum_sin; // square
sum_cos *= sum_cos; // square

}
sum_sin = std::pow(sum_sin,2);
sum_cos = std::pow(sum_cos,2);
profile.push_back(std::sqrt(sum_sin+sum_cos) / std::min(window_size, seq.size()));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
profile.push_back(std::sqrt(sum_sin+sum_cos) / std::min(window_size, seq.size()));
profile.push_back(std::sqrt(sum_sin+sum_cos) / effective_window);

Comment thread src/openms/source/CHEMISTRY/Residue.cpp Outdated
else {
throw Exception::InvalidValue(__FILE__, __LINE__, OPENMS_PRETTY_FUNCTION, "One letter code for this residue is empty", "");
}
if (amino_acid < 65 || amino_acid > 90)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (amino_acid < 65 || amino_acid > 90)
if (amino_acid < 'A' || amino_acid > 'Z')

github-actions Bot and others added 11 commits April 22, 2026 08:16
…enMS#9199)

Document the cmake fix that accepts Arrow/Parquet 24.x and later by
dropping the find_package version argument and using a VERSION_LESS
guard (dfa9d82). Also document the pyarrow 24.0.0 exclusion in
pyOpenMS dependencies. This entry was missing from the previous CHANGELOG
sync.

Co-authored-by: GitHub Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add two missing entries for changes merged after the last CHANGELOG sync (OpenMS#9193):

- OpenMS#9196: CMake now accepts Arrow/Parquet 24.x and later; Ubuntu 24.04 CI
  runners ship Arrow 24.0.0 which broke builds with the previous
  find_package(Arrow 23) version constraint. pyarrow 24.0.0 excluded from
  pyOpenMS dependencies (cp310-only wheels on PyPI).

- OpenMS#9174: Fix SpectraIDViewTab GUI compilation failure on macOS Apple Silicon
  (Tahoe); accession key lookup now converts to std::string after Qt string
  interop removal.

Co-authored-by: GitHub Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… lists (OpenMS#9203)

The previous sqrt(N) heuristic produces very coarse buckets on large
immunopeptidomics indices (e.g. bucket_size 22k for a 490M-fragment
non-specific human index). Each query peak then scans thousands of
fragments far outside its tolerance window.

Hard-coding bucket_size to 4096 approximates MSFragger's fixed ~0.02 Da
fragment-bin density in the dense 500-1500 Da region, so a bucket now
covers roughly one fragment-tolerance window instead of a wide sqrt(N)
span.

Benchmark on DN17_Liver_classI_techRep2 (SNES, 45M mothers,
unspecific cleavage, 7 ppm / 20 ppm, 8 threads):

- wall:  91 min -> 80.6 min  (-11%)
- CPU:  603 min -> 523 min   (-13%)
- PSMs: 28235 -> 28235       (identical)
- peak RSS unchanged

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…PPExecutable (OpenMS#9204)

ProSE invoked the Percolator rescoring step with the bare name
"PercolatorAdapter". runExternalProcess_ hands that to boost::process,
which only searches $PATH — in a dev build (or any install where the
OpenMS bin/ directory is not on PATH) the adapter is not found and the
rescoring silently falls back to HyperScore with just a WARN line:

    Standard error: Process 'PercolatorAdapter' failed to start.
    Does it exist? Is it executable?
    Percolator rescoring failed for <file>. Using original HyperScore results.

Resolve the sibling binary via File::findSiblingTOPPExecutable before
the per-file loop; on FileNotFound, skip rescoring once with a clear
warning instead of retrying (and failing silently) per file.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
 items 1a/1b) (OpenMS#9205)

* ProSE: scrub dangling protein refs after merged decoy/FDR cleanup

Fixes 'Invalid protein reference DECOY_...' crash when merged path runs
applyPickedProteinFDR (deletes decoy ProteinHits) and target+decoy PSMs
remain (kept by removeDecoyHits's exact-match check on target_decoy=='decoy').

Refs OpenMS#9197 item 1b.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ProSE: move -out_merged write after per-file outputs and wrap in try/catch

Fixes data loss on merged-write failure: previously a merged-write exception
aborted main_ and discarded per-file outputs even when per-file PSMs were
already complete in memory. Per-file writes now complete first and the merged
write is wrapped in try/catch so its failure becomes a logged error rather
than a terminal exception.

Refs OpenMS#9197 item 1a.

* ProSE: soften merged-write catch message wording

Don't claim per-file outputs were 'written successfully' — input_failed
may be true for one or more files. Direct user to check per-file errors
above.

* docs(CHANGELOG): ProSE issue OpenMS#9197 items 1a + 1b

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…to functions, improved readability of documentation
…penMS#9207)

std::filesystem::rename fails with EXDEV across filesystems, unlike
Qt's QFile::rename which silently copied and removed. PR OpenMS#8938 replaced
the Qt call without preserving that fallback, breaking TOPP adapters
(CometAdapter -pin_out, MSGFPlusAdapter -mzid_out) that move files from
a tmp dir to a bind-mounted output — a common container layout.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Reduced Memory Overhead by replacing map copy with map reference

* Add myself to Authors

* Updated CHANGELOG file

* reduced RAM usage with rvalue ref

* updated changelog & reorganized it so all TOPPTool changes are in one place

* added PR number to changelog

* applied recommended changes

* Applied Coderabbit suggestions

* applied suggested fixes

* removed backwards compatibility

---------

Co-authored-by: Tilman Aurich <tilman.aurich@fu-berlin.de>
Co-authored-by: Chris Bielow <chris.bielow@fu-berlin.de>
Restore ProSE/FragmentIndex/File entries dropped by the FeatureFinderCentroided
reorganization commit (OpenMS#9159) and add new entries for commits merged after the
last sync:

- OpenMS#9188: ProSE BREAKING: default isotope error range changed to [0,+2]
- OpenMS#9191: ProSE SNES mother-peptide indexing for non-specific searches
- OpenMS#9195: ProSE fix: defer PSM FDR when Percolator enabled; valid .pin output
- OpenMS#9203: FragmentIndex: fixed bucket_size 4096 for tighter candidate lists
- OpenMS#9204: ProSE: resolve PercolatorAdapter via findSiblingTOPPExecutable
- OpenMS#9205: ProSE: per-file output isolation + dangling decoy ref crash fix
- OpenMS#9207: File::rename: cross-device fallback (copy+remove on EXDEV)

Co-authored-by: GitHub Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
AGENTS.md was updated in OpenMS#9194 to note that contrib is a git submodule
and requires 'git submodule update --init contrib' before building.
Sync the same note to CLAUDE.md so both AI-assistant guidelines files
carry this important setup step.

Co-authored-by: GitHub Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.