Hydrophobicity scales and profiling#1
Conversation
Add ProteomicsLFQ entry for PR OpenMS#9030: - Bruker TimsTOF .d (BRUKER_TDF) input support with Biosaur2 seeding - IM_PEAK data path with FWHM estimation and skip of incompatible steps - New Seeding:algorithm parameter Co-authored-by: GitHub Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…penMS#9039) (OpenMS#9042) BOOST_PROCESS_USE_STD_FS was defined unconditionally but only exists since Boost 1.78. On older versions (e.g. 1.74), boost::process still uses boost::filesystem internally, causing undefined reference errors because Boost::filesystem was never linked. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The minimum Boost version was bumped from 1.74 to 1.78 in commit 742e5e1 (fix: bump minimum Boost to 1.78 to fix boost::filesystem link errors). BOOST_PROCESS_USE_STD_FS only exists since Boost 1.78; older versions cause undefined reference errors. Update the required dependencies line in AGENTS.md to reflect the new minimum version requirement. Co-authored-by: Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add missing Build System entry for minimum Boost version bump to 1.78 (OpenMS#9042) to fix boost::filesystem link errors. Co-authored-by: GitHub Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* docs: add design spec for removing Qt from TOPP tools Covers two-phase approach: replace Qt6::Core usage with std/OpenMS utilities in 11 tools, then move 3 GUI-entangled tools to openms_gui. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add implementation plan for removing Qt from TOPP tools 13 tasks covering Phase 1 (replace Qt6::Core in 11 tools) and Phase 2 (move 3 GUI tools to openms_gui, clean up CMake). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(Resampler): remove dead GUI include, ungate from WITH_GUI Remove unused #include <OpenMS/VISUAL/MultiGradient.h> from Resampler.cpp, move Resampler from TOPP_executables_with_GUIlib to TOPP_executables, remove it from the ToolHandler GUI_tools exclusion list, and drop the WITH_GUI guard around its TOPP test definitions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(MzMLSplitter): replace QFile::size() with std::filesystem::file_size() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(QCExtractor,QCShrinker): replace QFileInfo::baseName() with FileHandler::stripExtension() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(QCEmbedder): replace QFile/QByteArray with std::ifstream and Base64 Remove Qt includes (QByteArray, QFile, QString, QFileInfo) and replace with OpenMS FileHandler/Base64 and std::ifstream/ostringstream for file reading and base64 encoding. Use FileHandler::stripExtension + File::basename for compound-extension-aware basename extraction. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(IDRipper): replace QDir/QFileInfo with std::filesystem and File::absolutePath() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(AssayGeneratorMetaboSirius): replace QDirIterator with std::filesystem::directory_iterator Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(OpenSwath*): replace QDir with File::absolutePath() Replace QDir::absolutePath() and QFileInfo::baseName() usage in OpenSwathFileSplitter and OpenSwathWorkflow with OpenMS-native File::absolutePath() and FileHandler::stripExtension(File::basename()), removing the Qt dependency from both TOPP tools. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(CometAdapter): replace QRegularExpression with std::regex Remove Qt dependency on QRegularExpression, replacing it with std::regex and std::remove for space stripping as part of Qt removal from TOPP tools. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(MetaProSIP): replace QProcess/QFile/QDir with ExternalProcess and File utilities Replace all Qt dependencies in MetaProSIP.cpp: - Remove QtCore includes (QStringList, QFile, QDir, QFileInfo, QProcess) - Add ExternalProcess, filesystem includes - Add runRScript() helper to consolidate 5 near-identical QProcess blocks - Replace QProcess R execution with ExternalProcess::run() calls - Replace QFile::copy/remove with File::copy/remove - Replace QDir with File::absolutePath and std::filesystem::create_directories - Replace QFileInfo::baseName with FileHandler::stripExtension(File::basename()) - Change QString function parameters to const String& Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * build(topp): remove Qt6::Core linking — all non-GUI TOPP tools are Qt-free Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: move ExecutePipeline, INIUpdater, ImageCreator to openms_gui Move 3 TOPP tools that depend on OpenMS_GUI into the GUI library directory, making src/topp/ 100% Qt-free. Update CMake registration so the tools are built as part of the GUI executables, remain in the TOPP_TOOLS list for CWL/INI generation, and are dependencies of the TOPP collection target. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * build: move Qt6 find_package inside WITH_GUI block Qt is no longer needed by any non-GUI component. Guard KNIME packaging against missing Qt targets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: remove commented-out QIODevice includes from QC tools Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(QCEmbedder): use Base64::encodeStrings for raw binary data Base64::encode is designed for floating-point arrays, not raw bytes. Use encodeStrings instead which handles arbitrary binary data correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(MzMLSplitter): explicit std::string conversion for filesystem::file_size OpenMS::String does not implicitly convert to std::filesystem::path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add implementation plan for File API improvements Adds stemName(), extension(), listDirectories() to OpenMS::File and replaces awkward patterns in TOPP tools. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(File): add stemName() — basename without known file extension Adds File::stemName(path) as a convenience wrapper around FileHandler::stripExtension(File::basename(path)), covering the pattern that appears 32 times across the codebase. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(File): add extension() — compound-aware file extension extraction * feat(File): add listDirectories() — sorted list of subdirectories Adds File::listDirectories() static method that returns a sorted StringList of subdirectory paths (non-recursive) using std::filesystem, with no-throw semantics via std::error_code. Includes unit tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(topp): use File::stemName(), listDirectories(), fileSize(), makeDir() Replace FileHandler::stripExtension(File::basename(...)) with File::stemName(...) across 17 call sites in 10 TOPP tools. Replace raw std::filesystem calls with File API methods in AssayGeneratorMetaboSirius, MzMLSplitter, and MetaProSIP. Remove unnecessary #include <filesystem> and #include <FileHandler.h> where they are no longer needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: TOPP_TOOLS ordering and MetaProSIP error diagnostics - Swap add_subdirectory order in src/CMakeLists.txt: topp before openms_gui so TOPP_TOOLS cache is initialized before GUI appends. Fixes CWL/CTD generation for moved GUI tools. - Surface ExternalProcess error_msg in MetaProSIP's runRScript helper so startup failures (missing R, permissions) are visible in logs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(File): listDirectories uses error_code overloads and returns absolute paths Use is_directory(ec) and fs::absolute() to honor the documented no-throw and absolute-path contracts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(IDRipper): use to_path() for UTF-8 safe filesystem path construction std::filesystem::path(std::string) uses the current code page on Windows, not UTF-8. Use OpenMS::to_path() from PathUtils.h instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: use File::stemName() in core/GUI, remove spec/plan docs Replace all 15 remaining FileHandler::stripExtension(File::basename()) call sites in src/openms/ and src/openms_gui/ with File::stemName(). Remove FileHandler.h includes where no longer needed. Remove superpowers spec and plan documents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add missing entry for OpenMS#9041 (remove Qt dependency from TOPP tools): - Qt6::Core removed from all non-GUI TOPP tools in Dependencies section - Qt6 find_package guarded by WITH_GUI in Build System section Co-authored-by: GitHub Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…penMS#9052) The recent addition of IMPeakType (split from IMFormat in commit 3f8956f) was bound in bind_kernel.cpp alongside DriftTimeUnit and IMFormat. The binding file routing table did not reflect this, leaving developers to incorrectly infer from the 'Everything else → bind_misc.cpp' catchall that new IONMOBILITY enums should go in bind_misc.cpp. Add an explicit row documenting that IONMOBILITY enums (DriftTimeUnit, IMFormat, IMPeakType) belong in bind_kernel.cpp, and clarify the 'Everything else' row with an example (IMTypes class stays in bind_misc.cpp). Co-authored-by: GitHub Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat: add option to keep cached files in OpenSwath workflow * feat: add cache file auto cleanup to OpenSwathWorkflow * feat: implement per-run cache cleaner for temporary directory management * feat: enhance per-run cache directory creation with retry logic and error handling * feat: add custom base directory support for TempDir and remove PerRunCacheCleaner
…penMS#9060) The Qt refactoring in OpenMS#9041 removed all Qt dependencies from TOPP tools by replacing QFile, QDir, QFileInfo, QByteArray etc. with std::filesystem, OpenMS File utilities, and Base64. Qt6 is now only required for the GUI library (openms_gui / WITH_GUI), not for the command-line TOPP tools. Co-authored-by: GitHub Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ion) (OpenMS#9063) * Initial plan * Remove duplicate test_msspectrum.py and merge content into test_MSSpectrum.py Agent-Logs-Url: https://github.com/OpenMS/OpenMS/sessions/9f51b4f8-c1c1-4388-9ba5-934f33e708ed Co-authored-by: cbielow <6008722+cbielow@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: cbielow <6008722+cbielow@users.noreply.github.com>
…MS#9073) The feat commit cb6c7d1 added Bruker .d directory support to OpenSwathWorkflow (WITH_OPENTIMS builds) but did not update the doxygen documentation block. Add a new 'Bruker .d' subsection and update the overview list item to reflect that .d directories are now a supported input format alongside mzML/mzXML. Co-authored-by: GitHub Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…penMS#9068) * feat(MSstatsConverter): add -remove_shared_peptides flag to control shared peptide filtering The MSstatsConverter silently drops peptides mapping to proteins in different indistinguishable protein groups via isQuantifyable_(). This is hardcoded with no user control, causing proteins with only shared peptides to vanish from MSstats output even when the upstream ProteinQuantifier was configured with use_shared_peptides=true. This commit adds a -remove_shared_peptides flag (default true for backward compatibility) that allows users to retain shared peptides in the MSstats output. It also adds OPENMS_LOG_WARN messages reporting how many peptide hits were dropped, so the filtering is no longer silent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(MSstatsConverter): fix Doxygen warnings and backward-compatible default for remove_shared_peptides Add full @param documentation for storeLFQ() to match storeISO() style, fixing Doxygen warning test failures. Change remove_shared_peptides from registerFlag_ (default false) to registerStringOption_ with default "true" so shared peptides are removed by default, preserving backward-compatible behavior and ensuring test CSV outputs remain unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(API): update pyOpenMS bindings and test signatures for remove_shared_peptides - Add remove_shared_peptides parameter (default true) to storeLFQ() and storeISO() nanobind lambdas in bind_format.cpp - Update MSstatsFile_test.cpp START_SECTION signatures to match the new C++ API Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: handle unwritable $HOME in UpdateCheck gracefully UpdateCheck::run() calls fs::create_directories() with the throwing overload to create $HOME/.config/OpenMS. In container environments where $HOME=/ (not writable), this crashes all TOPP tools with: terminate called after throwing an instance of 'std::filesystem::filesystem_error': cannot create directories: Permission denied [//.config/OpenMS] Use the std::error_code overload (consistent with File.cpp lines 241, 316) and skip the update check gracefully when the config directory cannot be created. Fixes OpenMS#9074 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: set HOME=/tmp in container and handle unwritable $HOME in UpdateCheck Two complementary fixes for the crash when $HOME is not writable: 1. dockerfiles/Dockerfile: Set ENV HOME=/tmp in the tools-thirdparty stage so containers always have a writable home directory for config files (.config/OpenMS/*.ver). 2. src/openms/source/SYSTEM/UpdateCheck.cpp: Use the non-throwing std::error_code overload of create_directories (consistent with File.cpp) and skip the update check gracefully when the config directory cannot be created. This provides defense-in-depth for environments where $HOME is overridden to a non-writable path. Fixes OpenMS#9074 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use /home/openms as writable HOME in container Follow the pattern from quantms-rescoring: create a dedicated writable directory and set HOME to it, rather than using /tmp. This ensures .config/OpenMS and other home-relative paths work correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use HOME=/app consistent with quantms-rescoring convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: move HOME=/app to tools stage so it propagates to all child images Julianus noted the HOME setting should be in the tools stage (not just tools-thirdparty) since the tools image is also published separately. Docker ENV directives propagate through FROM inheritance, so setting it in tools covers tools, tools-thirdparty, and test stages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fixed the broken Hash in the config File * Update src/openms/extern/tool_description_lib/tdl-config.cmake * Update src/openms/extern/tool_description_lib/tdl-config.cmake * CMakeLists und executables fixed. * Revert "CMakeLists und executables fixed." This reverts commit 865eff1. --------- Co-authored-by: Chris Bielow <chris.bielow@fu-berlin.de>
If a PR adds a TOPP tool, this check will post a PR comment if that tool isn't listed in the `ToolHandler.cpp` file. Co-authored-by: Timo Sachsenberg <timo.sachsenberg@uni-tuebingen.de>
This change adds a script that can check if the authors of git commits are in the AUTHORS file. The script is automatically run during the PR process and if an author is missing then a comment is added to remind the PR author. Co-authored-by: Timo Sachsenberg <timo.sachsenberg@uni-tuebingen.de>
* fix: make enzyme configurable in MRMFeatureFinderScoring (OpenMS#9072) Replace hardcoded "Trypsin" enzyme in scorePeakgroups() with a user-configurable parameter. The enzyme is used for missed cleavage counting and defaults to "Trypsin" for backward compatibility. Valid values are populated from ProteaseDB. Since MRMFeatureFinderScoring parameters are already exposed in OpenSwathWorkflow via the Scoring: subsection, no TOPP tool changes are needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use ListUtils to convert enzyme names for setValidStrings std::vector<OpenMS::String> cannot be implicitly converted to std::vector<std::string> required by Param::setValidStrings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add two missing changelog entries for the OpenMS 3.6.0 under-development section: - General: Fix UpdateCheck crash when $HOME is not writable in containers (OpenMS#9075) - TOPP tools / MSstatsConverter: new -remove_shared_peptides parameter (OpenMS#9068) Co-authored-by: GitHub Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Timo Sachsenberg <timo.sachsenberg@uni-tuebingen.de>
* github action: welcome new contributors welcome new contributors and provide link to tally survey * apply codderrabbit suggestions
…st (OpenMS#9065) (OpenMS#9081) * fix: remove arrow/csv dependency from Arrow_test (OpenMS#9065) Arrow CSV support (ARROW_CSV) is OFF by default in upstream Apache Arrow. OpenMS never uses arrow/csv in its library code — the CSV portions of this test were copy-pasted from the Arrow tutorial and only tested Arrow itself, not OpenMS functionality. Remove CSV read/write from the test, keeping IPC and Parquet coverage which are the formats OpenMS actually uses. This fixes compilation for users whose Arrow installation lacks CSV support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use NEW_TMP_FILE for temp paths in Arrow_test Address CodeRabbit review: replace hardcoded filenames with NEW_TMP_FILE macro to avoid collisions in parallel test runs and stale artifacts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: remove Arrow IPC from test, keep only Parquet Arrow IPC is not used anywhere in the OpenMS library — only Parquet is. Simplify the test to a single Parquet roundtrip (write, read, write). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- MRMFeatureFinderScoring: add enzyme parameter entry for OpenSwathWorkflow (OpenMS#9077) - yaml-cpp 0.9.0 dependency bump fixing YAML hash parsing (OpenMS#9056) Co-authored-by: GitHub Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rator (OpenMS#9078) (OpenMS#9084) The neutral-loss loop for prefix ions (b/a) reset the running mass accumulator but did not re-add peptide[0]'s residue mass, unlike the plain-ion loop above it. This caused all b/a neutral-loss fragments to be too low by exactly the first residue's internal mass. Add the same preamble (residue 0 mass + loss formulas) to the neutral-loss loop, mirroring the plain-ion loop structure. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add design spec for float IntensityType propagation (OpenMS#8872) Design for propagating float IntensityType through ~30 algorithm and I/O locations. Five phased PRs: template openswathalgo, cleanup casts, algorithm internals, quantitation/metadata, I/O buffers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update float propagation spec with Codex review findings Address critical issues from Codex review: - Add computeAndAppendRank/computeRankVector to PR 0 template list - Add DataValue::DOUBLE_LIST boundary for OpenSwath_Ind_Scores - Add missing header files to PR 2 file list (MRMScoring.h, etc.) - Add MRMFeature.cpp to PR 2 for float->DoubleList conversion - Fix line number references (QcMLFile, LinearResampler, OpenSwathScoring) - Add deferred mobilogram/XIM paths to out-of-scope Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: address code review findings in float propagation spec Critical fixes: - Add IFeature::getIntensity(vector<float>&) overload to PR 0 scope to unblock PR 2's float intensity matrices in MRMScoring - Audit OpenSwath_Ind_Scores fields: only 7 of 40 are genuine intensity values (change to float); rest are scores/coordinates (stay double) Important fixes: - Keep MSChromatogramParquetConsumer buffer as double (encodeNP float overload adds overhead via temporary double copy) - Document GaussFilterAlgorithm::integrate_() double return requiring explicit narrowing cast - Reframe SIMD claims: primary win is memory reduction, Eigen float paths are secondary benefit, push_back loops won't vectorize - Document normalized_library_intensity stays double, scoring templates instantiated for both float and double parameter types - Add DIAPrescoring.cpp change description to PR 2 table - Add conversion map showing all float/double boundaries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: apply third-pass review fixes to float propagation spec - Move ind_im_log_intensity from float to double category (log-transformed score, not raw intensity) — 6 of 40 fields now change to float - Add fillIntensityFromPrecursorFeature and initializeXCorrMatrix float overload to PR 2 scope (both missed in prior reviews) - Expand cascading changes section with concrete dependency chains showing how DIAHelper -> DIAPrescoring -> normalize flows work - Clarify conversion map notation (concrete classes, not templates) - Add PR 0 scope rationale note (split option documented) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: consolidate 10-agent review findings into float propagation spec Critical fixes: - Narrow Ind_Scores float fields from 6 to 3 (only raw intensity: ind_area_intensity, ind_total_area_intensity, ind_apex_intensity). Ratios, log-transforms, and scores stay double. - Remove QcMLFile.cpp from PR 1 (String(float) changes output format) - Fix MRMFeatureFinderScoring.cpp line refs (457-458 are scores, not intensity — corrected to 390-398) - Add IsobaricWorkflow.cpp to PR 3 (main extractSingleSpec caller) - Remove CachedMzMLHandler from PR 4 (already handles conversions) - Add MRMTransitionGroupPicker.h:811-831, PeakPickerChromatogram.h:139, MRMScoring.cpp:544 to PR 2 scope Warning fixes: - Add norm/manhattanDist float instantiation deps to PR 0 - Fix SpectralAngle Eigen note - Add GaussFilterAlgorithm.h to PR 2 file list - Add IonMobilityScoring mixed-type vector risk - Add DIAPrescoring.cpp mixed-type iterator linker risk - Add ITransitionGroup/LightTransition to boundaries table - Add TraML/TSV precision and pyOpenMS behavioral risks - Expand Out of Scope with 20+ deferred locations from comprehensive codebase scan Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: resolve all open decisions in float propagation spec Decisions resolved: - MRMScoring API: template on scalar type (not overloaded) - calcLibraryScore: experimental_intensity stays double (paired with library_intensity in same-type scoring calls) - DIAPrescoring: intTheor also changes to vector<float> (avoids mixed-type iterator instantiation) - IonMobilityScoring: ms1_int_values and all intensity vectors change to float (source data is MobilityPeak1D::IntensityType = float) - IFeature vtable: acknowledged as acceptable (no stable ABI contract) - IsobaricIsotopeCorrector: stays double, removed from PR 3 scope - MRMScoring MI functions: all 5 initializeMI* explicitly covered - Deprecated double[] Scoring overloads: stay double-only Consistency fixes: - MRMTransitionGroupPicker PR 1: corrected line refs (426,439) - Removed resolved mixed-type risks from risk table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add PR 0 implementation plan for float templating 9-task plan covering: - Tasks 1-4: Template 11 Scoring functions (normalize_sum, cross-corr, standardize_data, NormalizedManhattanDist, RMSD, SpectralAngle, rank helpers) - Task 5: Template 3 StatsHelpers functions + float instantiations for norm/manhattanDist - Tasks 6-7: Float test cases for Scoring and DiaHelpers - Task 8: IFeature::getIntensity(vector<float>&) overload - Task 9: Integration verification Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * build: disable TDL (CWL support) by default TDL pulls in yaml-cpp as a transitive dependency and is only needed for generating Common Workflow Language tool descriptions. Most users don't need CWL export. Disable by default to reduce build dependencies. Enable with -DENABLE_TDL=ON when CWL support is needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: suppress unused parameter warnings when TDL is disabled Cast parameters to void in the #else branch to avoid MSVC /we4100 errors (unused parameter treated as error) when ENABLE_TDL is off. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: remove plan and spec documents from PR Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TheoreticalSpectrumGenerator: fix m/z for b/a neutral-loss ions (OpenMS#9078, OpenMS#9084) - Build: ENABLE_TDL (CWL support) now defaults to OFF (OpenMS#9067) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Initial plan * Fix locale-dependent sorting in AUTHORS check workflow Set LC_ALL=C in the check-authors-file workflow to ensure the sort command in ci-tools/scripts/authors.sh uses byte-order sorting instead of locale-dependent collation. This prevents names with periods (like "Michael R. Crusoe") or prefix-matching names (like "Marc"/"Marcel") from being sorted incorrectly depending on the CI environment's locale settings. Also re-sorts the AUTHORS file to match LC_ALL=C sort -u ordering: - "Johan Teleman" before "Johannes" (space < 'n' in byte order) - "Julia Thueringer" before "Juliane" (space < 'n' in byte order) - "Marc Sturm" before "Marcel Schilling" (space < 'e' in byte order) Agent-Logs-Url: https://github.com/OpenMS/OpenMS/sessions/b027b023-82a4-4eae-b131-714c4c5b1722 Co-authored-by: timosachsenberg <5803621+timosachsenberg@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: timosachsenberg <5803621+timosachsenberg@users.noreply.github.com>
…penMS#9088) * feat: refactor FragmentIndex to SoA layout with SIMD-accelerated query Replace the Array-of-Structs vector<Fragment> with Structure-of-Arrays (separate float[] for m/z and uint32_t[] for peptide indices). This enables SIMD vectorization of the query hot loop. The query inner loop now uses SSE2 via SIMDe to compare 4 fragment m/z values against the tolerance window per iteration, giving ~3-4x speedup on the most performance-critical path in database search. Changes: - FragmentIndex.h: replace fi_fragments_ with fi_fragment_mzs_ and fi_fragment_peptide_idxs_ parallel vectors - FragmentIndex.cpp: permutation-based sorting for SoA, SIMD query loop with scalar remainder, updated clear()/print_slice() - FragmentIndex_test.cpp: update fragmentsSorted() for SoA access Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add scalar query fallback and SIMD vs scalar benchmark Add queryScalar() method that uses the same SoA layout but without SIMD intrinsics, for direct performance comparison. Add benchmark test section that runs both paths 1000 iterations and reports speedup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: treat peptide_idx_range.second as exclusive, fix benchmark params getPeptidesInPrecursorRange() returns a half-open range [first, second) via upper_bound, but query() treated second as inclusive. Fix both SIMD and scalar paths to use >= for the exclusive upper bound check. Also fix benchmark test parameter names and types to match actual FragmentIndex defaults, and register FragmentIndex_test in executables.cmake. Benchmark result: ~2.9x speedup (SIMD vs scalar) on small test case. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: SIMD loop double-hit bug and add comprehensive edge case tests Fix: after the SIMD loop's pidx early-exit break, advance i past the processed group (i += 4) so the scalar remainder doesn't re-scan the same elements. This caused duplicate hits whenever the last element in a SIMD group exceeded the peptide_idx range. Add 9 new test sections verifying SIMD vs scalar equivalence across: - Multiple peptides and charge states (1-3) - Real FASTA sequences (from SSE/Comet test data) - Empty precursor range [k, k) - Very small index (< 4 fragments, scalar-only path) - PPM tolerance mode - Tolerance boundary (peaks just inside/outside tolerance) - Multiple fragment charges - Wide precursor window (many candidates) - SoA ordering invariants Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review findings — reserve placement, hit content verification - Move fragment reserve() after generatePeptides() so fi_peptides_.size() is nonzero and the preallocation actually works (CodeRabbit finding A) - Add SoA size consistency guard in fragmentsSorted() (CodeRabbit B) - Strengthen simdScalarMatch() to compare actual hit content (peptide_idx + fragment_mz) element-wise, not just hit count (CodeRabbit D) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: remove SIMD query, keep SoA layout and edge case tests Benchmarking showed the SIMD query loop provides negligible speedup on real workloads because the build phase (digestion, sorting, OMP critical sections) dominates total runtime (~95%). Remove the SIMD code complexity while keeping the beneficial SoA layout, exclusive range fix, and comprehensive edge case tests. Removed: queryScalar(), SIMDe include, SIMD vs scalar benchmark tests Kept: SoA layout, permutation sort, exclusive range semantics, 6 edge case tests (empty range, small bucket, ppm tolerance, tolerance boundary, wide precursor window, SoA invariants) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: eliminate omp critical in fragment generation via per-thread vectors Replace the serialized omp critical section (55M mutex acquisitions for fragment push_back) with per-thread vector pairs that accumulate fragments lock-free, then merge into the global SoA arrays after the parallel region. Benchmark on HeLa DDA TimsTOF + Human SwissProt (4 threads): - Wall time: 3m40s → 3m14s (1.13x faster) - CPU time: 5m02s → 3m48s (1.32x less contention) - Results identical (same PSM counts at 1% FDR) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: eliminate omp critical in fragment generation via per-thread vectors Revert SoA layout (was a net regression: 2x slower, 1.8x more memory due to permutation sort overhead). Keep the original AoS Fragment struct and apply per-thread vector accumulation to eliminate the omp critical section that serialized ~55M push_back calls. Each thread accumulates fragments into its own vector, then vectors are merged sequentially after the parallel region. This avoids mutex contention while keeping AoS in-place sort efficiency. Benchmark on DDA TimsTOF + Human SwissProt (4 threads): develop (omp critical): 1m17s wall, 2m30s CPU, 3326 MB this commit: 57s wall, 1m22s CPU, 3326 MB → 1.35x wall speedup, 1.83x CPU reduction, same memory Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: residue lookup table for lightweight fragment/precursor generation Replace AASequence::fromString + TheoreticalSpectrumGenerator with a static residue mass lookup table (char -> internal monoisotopic mass) for direct fragment m/z computation from protein sequence strings. Changes: - Add residue_mass_table_[128] populated from ResidueDB at first use - Add generateFragmentsLightweight_() computing b/y ions via cumulative sum over the lookup table (no AASequence parsing, no TSG overhead) - Unmodified peptide path in generatePeptides() computes precursor m/z directly from lookup table (eliminates AASequence::fromString entirely) - Modified path extracts per-residue mass deltas from AASequence after modification application, then uses lightweight generator - Per-thread vectors in generatePeptides() (eliminates second omp critical) Build phase timing (20 threads, Human SwissProt): generatePeptides: 5.3s -> 2.1s (2.5x faster) generateFragments: kept at 2.3s (already fast from per-thread vectors) Overall benchmark (DDA TimsTOF + Human SwissProt, 20 threads): develop: 1m17s wall, 2m30s CPU this branch: 1m07s wall, 2m19s CPU (same results, same memory) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: correct fragment loop bounds and residue mass table for lightweight generation Stop prefix (b/a/c) and suffix (y/x/z) ion loops one position early to match TheoreticalSpectrumGenerator default behavior — avoids emitting full-length ions (b_n/y_n) that are effectively precursor masses. Populate residue mass table from all ResidueDB single-letter codes (A–Z) instead of hardcoding the 20 canonical AAs, so selenocysteine (U) and pyrrolysine (O) get correct masses. Also skip peptides with ambiguous codes (B/Z/J) alongside X. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: default to all cores in search engines (threads=0 support) PeptideDataBaseSearchFI and SimpleSearchEngine now use all available cores when -threads is not explicitly set (matching Sage's default). TOPPBase::setMaxNumberOfThreads treats 0 as all cores via omp_get_num_procs(). Global default stays at 1 for other tools. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Revert "feat: default to all cores in search engines (threads=0 support)" This reverts commit 1163082. * Reapply "feat: default to all cores in search engines (threads=0 support)" This reverts commit f48a7a3. * fix: update test reference files for threads description change Update all INI/toppas reference files and TOPPBase_test.cpp to match the new threads parameter description "(0 = all available cores)". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review findings for fragment index fast-build - Guard #include <omp.h> with #ifdef _OPENMP and add fallbacks for non-OpenMP builds (num_threads=1, tid=0) - Replace static bool init guard with std::call_once for thread-safe residue mass table initialization - Remove search engine thread overrides that hijacked -threads 1 to mean all cores (threads=0 support remains at TOPPBase level) - Remove J from ambiguous AA filter (ResidueDB defines J/Xle with valid mass) - Replace std::cout with OPENMS_LOG_INFO for progress messages - Remove redundant initResidueMassTable_() call from build() - Rename file-scope static computePrecursorMzFromChars_ to drop trailing underscore (not a member function) - Add lightweight_fragment_count test validating 2*(n-1) fragments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: bitmask-based modification enumeration bypassing AASequence Replace AASequence::fromString + ModifiedPeptideGenerator with direct bitmask enumeration of variable modification combinations: - Add per-AA modification lookup tables (fixed_mod_deltas_, variable_mod_table_) built once from the modification config - Scan peptide sequences to find variable mod slots (position, delta_mass, mod_ptr) — deterministic left-to-right ordering - Enumerate valid bitmask subsets with conflict detection for multi-mod-per-site (mutually exclusive bits for same position) - Compute precursor m/z as base_mass + sum(selected deltas) - Reconstruct per-residue delta arrays from bitmask in build() for fragment generation (no AASequence needed) - Add reconstructModifiedSequence() for output-time AASequence reconstruction (only called for final ~1000 hits, not millions) - Replace modification_idx_ (UInt32) with mod_bitmask_ (uint32_t) supporting up to 32 variable mod slots per peptide - Simplify PeptideSearchEngineFIAlgorithm hit processing from 10 lines of AASequence reconstruction to single method call - Add edge case tests: multi-mod-per-site, fixed+variable mods, fragment count with modifications, AASequence reconstruction Eliminates all AASequence construction from the build hot path. Expected ~20-50x speedup for the modification branch of build(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: replace __builtin_popcount with std::popcount for MSVC compat Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: cross-validate bitmask enumeration against ModifiedPeptideGenerator Add comprehensive test comparing FragmentIndex bitmask-based modification enumeration against ModifiedPeptideGenerator for correctness: - Simple fixed + variable mods (Carbamidomethyl + Oxidation) - Multiple modifiable sites with combinatorial enumeration - N-terminal variable mod coexisting with residue mod - Two different variable mods targeting the same amino acid - No modifiable sites (fixed mods only, no matching residues) - Mixed site types with max_variable_mods=3 Each case validates variant count, precursor masses, and reconstructed AASequence strings match the original ModifiedPeptideGenerator output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: handle PROTEIN_N_TERM/C_TERM mods based on protein position PROTEIN_N_TERM mods (e.g., Acetyl (Protein N-term)) should only apply to peptides starting at protein position 0, not to every peptide. Similarly, PROTEIN_C_TERM mods only apply at the protein C-terminus. Pass is_protein_nterm/is_protein_cterm flags to buildModSlots_ based on the peptide's position within the protein sequence. This correctly handles these modifications where ModifiedPeptideGenerator silently skipped them entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add FDR filtering tests for SSE and FI DDA benchmarks Add TOPP_FalseDiscoveryRate_SimpleSearchEngineDDA and TOPP_FalseDiscoveryRate_PeptideDataBaseSearchFIDDA tests that apply 1% PSM FDR filtering after the DDA search, matching the existing TOPP_FalseDiscoveryRate_SageDDA test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: doxygen documentation for new FragmentIndex types Use @brief style for VarModEntry and ModSlot structs, add documentation for static constexpr sentinel values, replace unicode em-dash with ASCII in reconstructModifiedSequence comment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: parallel fragment sort via Boost.Sort block_indirect_sort Replace single-threaded std::sort with boost::sort::block_indirect_sort for the global fragment m/z sort. This uses std::thread internally (no TBB dependency) and provides ~4.6x speedup on the sort phase (45.6s down to 10.0s on 200M fragments). Also add default constructor to Fragment struct (required for parallel sort temporary buffers). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: set Sage min_ion_index=0 in DDA benchmark for fair comparison Sage's default min_ion_index=2 skips b1/b2/y1/y2 ions in preliminary scoring while OpenMS engines match all ions. Set to 0 for an apples-to-apples comparison in the DDA integration test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add fragment:min_ion_index parameter to FragmentIndex Skip the first N ions from each series (b/y/a/c/x/z) during fragment generation. Default 0 (include all ions). Setting to 2 skips b1/b2 and y1/y2, matching Sage's default behavior. This allows direct comparison and can reduce noise from unreliable low-index ions. Includes test validating fragment count with min_ion_index=2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…1/y2) (OpenMS#9094) * feat: default fragment:min_ion_index=2 (skip b1/b2/y1/y2) Match Sage's default behavior of excluding the four shortest fragment ions (b1, b2, y1, y2) from the fragment index. These low-index ions are often noisy and unreliable, and skipping them: - Improves PSM count at 1% FDR (DDA HeLa: 4339 → 4374, +35) - Reduces wall time (59s → 48s, -11s) - Reduces memory usage (4283 MB → 3984 MB, -300 MB) Also expose fragment:min_ion_index parameter through PeptideSearchEngineFIAlgorithm so it can be set by TOPP tools. Update tests that depend on full ion counts to set min_ion_index=0 explicitly (lightweight_fragment_count, multi_mod_per_site, fixed_plus_variable_mods, querySpectrum, tolerance). Update Sage DDA test to use -min_ion_index 2 (matching new FI default). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: clarify fragment:min_ion_index help text and inline comment The previous wording "Minimum ion index to consider" and "Ions below this index" was inconsistent with the actual implementation, which skips ions with index ≤ min_ion_index_ (note: less-than-OR-EQUAL, not strictly below). With min_ion_index=2, b1 and b2 are both skipped (min ion considered is actually 3, not 2). Update help string in both FragmentIndex.cpp and PeptideSearchEngineFIAlgorithm.cpp, plus the inline comment that incorrectly said "<" instead of "<=". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: timosachsenberg <sachsenb@ibminode06.Cs.Uni-Tuebingen.De> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* doc: Repair the AUTHORS file Somehow the file got goofed up since my last PR. * ci/authors: Don't set LC_ALL The `authors` script will take care of this now.
| { | ||
| amino_acid = one_letter_code_[0]; | ||
| } | ||
| else { |
| const int scale_idx = static_cast<int>(scale); | ||
| if (scale_idx < 0 || scale_idx >= 7) | ||
| { | ||
| throw Exception::InvalidValue(__FILE__, __LINE__, OPENMS_PRETTY_FUNCTION, "Unknown hydrophobicity scale", ""); |
There was a problem hiding this comment.
In the exception, the empty String part is kind of irritating. Replacement with the scale, since it caused the exception. Or - if needed - cast to string?
| const double result = scales[scale_idx][amino_acid - 'A']; | ||
| if (result == 999) | ||
| { | ||
| throw Exception::InvalidValue(__FILE__, __LINE__, OPENMS_PRETTY_FUNCTION, "No hydrophobicity value known for this residue", one_letter_code_); |
There was a problem hiding this comment.
See comment above. One letter Code caused the exception here
|
|
||
| START_SECTION(double computeGRAVY(const AASequence& seq)) | ||
| { | ||
| AASequence seq("ACDE"); |
* fix(cmake): accept Arrow/Parquet 24.x and later
Ubuntu 24.04 GitHub-hosted runners shipped Arrow 24.0.0 as of 2026-04-21,
breaking all Linux CI jobs with:
CMake Error: Could not find a configuration file for package "Arrow"
that is compatible with requested version "23".
ArrowConfig.cmake, version: 24.0.0
Root cause: Arrow's ConfigVersion file uses SameMajorVersion
compatibility, so `find_package(Arrow 23 CONFIG REQUIRED)` refuses 24.x
even when 24.x would work for our usage. Drop the version from
find_package and enforce the >= 23 minimum via an explicit
Arrow_VERSION / Parquet_VERSION VERSION_LESS check. This accepts any
major >= 23 without guessing which future majors will be API-compatible.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(pyOpenMS): exclude pyarrow 24.0.0 (cp310-only wheels on PyPI)
pyarrow 24.0.0 on PyPI only shipped cp310 wheels with no source
distribution, so `uv sync` fails on runners using cp311/cp312/cp313:
error: Distribution `pyarrow==24.0.0` can't be installed because it
doesn't have a source distribution or wheel for the current platform
hint: You're using CPython 3.12 (`cp312`), but `pyarrow` (v24.0.0)
only has wheels with the following Python ABI tag: `cp310`
Before this commit, pyarrow was listed unpinned in the arrow/all/test
optional dependency groups and in cibuildwheel test-requires, so uv
picked up 24.0.0 and refused to install, which in turn left the
bootstrapped venv without numpy and failed the subsequent
`find_package(Python ... NumPy REQUIRED)` call.
Use `pyarrow!=24.0.0` rather than `<24.0.0` so that a future 24.0.1 (or
later) release with proper wheel coverage is picked up automatically.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| const int scale_idx = static_cast<int>(scale); | ||
| if (scale_idx < 0 || scale_idx >= 7) | ||
| { | ||
| throw Exception::InvalidValue(__FILE__, __LINE__, OPENMS_PRETTY_FUNCTION, "Unknown hydrophobicity scale", ""); |
There was a problem hiding this comment.
| throw Exception::InvalidValue(__FILE__, __LINE__, OPENMS_PRETTY_FUNCTION, "Unknown hydrophobicity scale", ""); | |
| throw Exception::InvalidValue(__FILE__, __LINE__, OPENMS_PRETTY_FUNCTION, "Unknown hydrophobicity scale", String(scale_idx)); |
| Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol. 1984 Oct 15;179(1):125-42. doi: 10.1016/0022-2836(84)90309-7. PMID: 6502707. | ||
|
|
||
| Eisenberg D, Weiss RM, Terwilliger TC. The hydrophobic moment detects periodicity in protein hydrophobicity. Proc Natl Acad Sci U S A. 1984 Jan;81(1):140-4. doi: 10.1073/pnas.81.1.140. PMID: 6582470; PMCID: PMC344626. |
There was a problem hiding this comment.
| Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol. 1984 Oct 15;179(1):125-42. doi: 10.1016/0022-2836(84)90309-7. PMID: 6502707. | |
| Eisenberg D, Weiss RM, Terwilliger TC. The hydrophobic moment detects periodicity in protein hydrophobicity. Proc Natl Acad Sci U S A. 1984 Jan;81(1):140-4. doi: 10.1073/pnas.81.1.140. PMID: 6582470; PMCID: PMC344626. | |
| - Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol. 1984 Oct 15;179(1):125-42. doi: 10.1016/0022-2836(84)90309-7. PMID: 6502707. | |
| - Eisenberg D, Weiss RM, Terwilliger TC. The hydrophobic moment detects periodicity in protein hydrophobicity. Proc Natl Acad Sci U S A. 1984 Jan;81(1):140-4. doi: 10.1073/pnas.81.1.140. PMID: 6582470; PMCID: PMC344626. |
| Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982 May 5;157(1):105-32. doi: 10.1016/0022-2836(82)90515-0. PMID: 7108955. | ||
|
|
||
| Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol. 1984 Oct 15;179(1):125-42. doi: 10.1016/0022-2836(84)90309-7. PMID: 6502707. | ||
|
|
||
| T.P. Hopp, & K.R. Woods, Prediction of protein antigenic determinants from amino acid sequences., Proc. Natl. Acad. Sci. U.S.A. 78 (6) 3824-3828, https://doi.org/10.1073/pnas.78.6.3824 (1981). |
There was a problem hiding this comment.
| Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982 May 5;157(1):105-32. doi: 10.1016/0022-2836(82)90515-0. PMID: 7108955. | |
| Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol. 1984 Oct 15;179(1):125-42. doi: 10.1016/0022-2836(84)90309-7. PMID: 6502707. | |
| T.P. Hopp, & K.R. Woods, Prediction of protein antigenic determinants from amino acid sequences., Proc. Natl. Acad. Sci. U.S.A. 78 (6) 3824-3828, https://doi.org/10.1073/pnas.78.6.3824 (1981). | |
| - Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982 May 5;157(1):105-32. doi: 10.1016/0022-2836(82)90515-0. PMID: 7108955. | |
| - Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol. 1984 Oct 15;179(1):125-42. doi: 10.1016/0022-2836(84)90309-7. PMID: 6502707. | |
| - T.P. Hopp, & K.R. Woods, Prediction of protein antigenic determinants from amino acid sequences., Proc. Natl. Acad. Sci. U.S.A. 78 (6) 3824-3828, https://doi.org/10.1073/pnas.78.6.3824 (1981). |
| Henry B. Bull, Keith Breese, Surface tension of amino acid solutions: A hydrophobicity scale of the amino acid residues, Archives of Biochemistry and Biophysics, Volume 161, Issue 2, 1974, Pages 665-670, ISSN 0003-9861, https://doi.org/10.1016/0003-9861(74)90352-X. (https://www.sciencedirect.com/science/article/pii/000398617490352X) | ||
|
|
||
| Black SD, Mould DR. Development of hydrophobicity parameters to analyze proteins which bear post- or cotranslational modifications. Anal Biochem. 1991 Feb 15;193(1):72-82. doi: 10.1016/0003-2697(91)90045-u. PMID: 2042744. | ||
|
|
||
| Guy, H. R. (1985). Amino acid side-chain partition energies and distribution of residues in soluble proteins. Biophysical journal, 47(1), 61-70. |
There was a problem hiding this comment.
| Henry B. Bull, Keith Breese, Surface tension of amino acid solutions: A hydrophobicity scale of the amino acid residues, Archives of Biochemistry and Biophysics, Volume 161, Issue 2, 1974, Pages 665-670, ISSN 0003-9861, https://doi.org/10.1016/0003-9861(74)90352-X. (https://www.sciencedirect.com/science/article/pii/000398617490352X) | |
| Black SD, Mould DR. Development of hydrophobicity parameters to analyze proteins which bear post- or cotranslational modifications. Anal Biochem. 1991 Feb 15;193(1):72-82. doi: 10.1016/0003-2697(91)90045-u. PMID: 2042744. | |
| Guy, H. R. (1985). Amino acid side-chain partition energies and distribution of residues in soluble proteins. Biophysical journal, 47(1), 61-70. | |
| - Henry B. Bull, Keith Breese, Surface tension of amino acid solutions: A hydrophobicity scale of the amino acid residues, Archives of Biochemistry and Biophysics, Volume 161, Issue 2, 1974, Pages 665-670, ISSN 0003-9861, https://doi.org/10.1016/0003-9861(74)90352-X. (https://www.sciencedirect.com/science/article/pii/000398617490352X) | |
| - Black SD, Mould DR. Development of hydrophobicity parameters to analyze proteins which bear post- or cotranslational modifications. Anal Biochem. 1991 Feb 15;193(1):72-82. doi: 10.1016/0003-2697(91)90045-u. PMID: 2042744. | |
| - Guy, H. R. (1985). Amino acid side-chain partition energies and distribution of residues in soluble proteins. Biophysical journal, 47(1), 61-70. |
| /// @warning When the window size is larger than the sequence the window size will be clamped to the sequence length | ||
| std::vector<double> computeWindowedProfile( | ||
| const AASequence& seq, | ||
| Size window_size = 7, |
There was a problem hiding this comment.
| Size window_size = 7, | |
| const Size window_size = 7, |
| /// @brief returns the hydrophobicity value of the residue | ||
|
|
||
| /// The sources for the hydrophobicity scales are here: @ref HydrophobicityProfile | ||
|
|
||
| /// @param scale which scale to use for the hydrophobicity value | ||
| /// @return hydrophobicity value of the residue | ||
| /// @throw Exception::InvalidValue Throws an exception if the residue is not one of the 20 common amino acids or when an unknown scale is used |
There was a problem hiding this comment.
| /// @brief returns the hydrophobicity value of the residue | |
| /// The sources for the hydrophobicity scales are here: @ref HydrophobicityProfile | |
| /// @param scale which scale to use for the hydrophobicity value | |
| /// @return hydrophobicity value of the residue | |
| /// @throw Exception::InvalidValue Throws an exception if the residue is not one of the 20 common amino acids or when an unknown scale is used | |
| /** | |
| @brief returns the hydrophobicity value of the residue | |
| The sources for the hydrophobicity scales are here: @ref HydrophobicityProfile | |
| @param scale which scale to use for the hydrophobicity value | |
| @return hydrophobicity value of the residue | |
| @throw Exception::InvalidValue Throws an exception if the residue is not one of the 20 common amino acids or when an unknown scale is used | |
| */ |
| /// @brief Enum for different hydrophobicity scales | ||
|
|
||
| /// If a new scale is introduced, append it to the list below and assign it an enum value equal to the current maximum enum value plus one. | ||
|
|
||
| /// Add the data for this scale here: @ref Residue::getHydrophobicity |
There was a problem hiding this comment.
| /// @brief Enum for different hydrophobicity scales | |
| /// If a new scale is introduced, append it to the list below and assign it an enum value equal to the current maximum enum value plus one. | |
| /// Add the data for this scale here: @ref Residue::getHydrophobicity | |
| /** | |
| @brief Enum for different hydrophobicity scales | |
| If a new scale is introduced, append it to the list below and assign it an enum value equal to the current maximum enum value plus one. | |
| Add the data for this scale here: @ref Residue::getHydrophobicity | |
| */ |
| std::vector<double> HydrophobicityProfile::computeWindowedProfile | ||
| ( | ||
| const AASequence& seq, | ||
| Size window_size, |
There was a problem hiding this comment.
| Size window_size, | |
| const Size window_size, |
| if (window_size > seq.size()) | ||
| { | ||
| OPENMS_LOG_WARN << "Warning: window size (" << window_size << ") is larger than sequence length. Window size clamped to sequence length: " << seq.size() << "\n"; | ||
| } | ||
| std::vector<double> profile; | ||
| Size effective_window = std::min(window_size, seq.size()); // size of the window |
There was a problem hiding this comment.
| if (window_size > seq.size()) | |
| { | |
| OPENMS_LOG_WARN << "Warning: window size (" << window_size << ") is larger than sequence length. Window size clamped to sequence length: " << seq.size() << "\n"; | |
| } | |
| std::vector<double> profile; | |
| Size effective_window = std::min(window_size, seq.size()); // size of the window | |
| const Size effective_window = [&](){ | |
| if (window_size > seq.size()) | |
| { | |
| OPENMS_LOG_WARN << "Warning: window size (" << window_size << ") is larger than sequence length. Window size clamped to sequence length: " << seq.size() << "\n"; | |
| return seq.size(); | |
| } | |
| return window_size; | |
| }; | |
| std::vector<double> profile; |
…OpenMS#9195) Two related ProSE bugs around Percolator rescoring: 1. FDR:PSM was applied inside the algorithm on raw HyperScore q-values before Percolator ran, which also stripped decoys via the fdr_protein_==0 branch (protein FDR is already deferred). Percolator then saw a target-only input and aborted with "No decoys found". Fix: defer FDR:PSM alongside FDR:protein when -percolator_executable is set. Post-rescoring, apply PSM FDR on Percolator q-values directly (scores are already q-values via -score_type q-value). Files that fell back to HyperScores (Percolator skipped/failed) compute q-values via FalseDiscoveryRate first. Decoys are only removed when protein FDR is disabled, matching the algorithm's existing semantics. 2. -out_pin emitted a .pin header missing the mandatory SpecId / Label / ScanNr columns and the standard mass/charge/enzyme features (ExpMass, CalcMass, mass, peplen, charge{N}, enzN, enzC, enzInt, dm, absdm) that PercolatorInfile::preparePin_ already sets on every hit. The output was not consumable by the percolator CLI. Fix: factor the canonical column list into a new PercolatorInfile::getStandardFeatureSet() helper. Both PercolatorAdapter (previously hardcoded) and ProSE -out_pin now build their feature_set from this helper, so the two tools emit consistent, standards-compliant .pin files. Verified: all 18 ProSE tests, PercolatorInfile_test, and PercolatorAdapter INI/CTD tests pass. Smoke-tested ProSE -out_pin on SimpleSearchEngine_1.mzML: header now begins with SpecId / Label / ScanNr and ends with Peptide / Proteins. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Size window_size, | ||
| double angle |
There was a problem hiding this comment.
| Size window_size, | |
| double angle | |
| const Size window_size, | |
| const double angle |
| sum_sin = std::pow(sum_sin,2); | ||
| sum_cos = std::pow(sum_cos,2); |
There was a problem hiding this comment.
| sum_sin = std::pow(sum_sin,2); | |
| sum_cos = std::pow(sum_cos,2); | |
| sum_sin *= sum_sin; // square | |
| sum_cos *= sum_cos; // square |
| } | ||
| sum_sin = std::pow(sum_sin,2); | ||
| sum_cos = std::pow(sum_cos,2); | ||
| profile.push_back(std::sqrt(sum_sin+sum_cos) / std::min(window_size, seq.size())); |
There was a problem hiding this comment.
| profile.push_back(std::sqrt(sum_sin+sum_cos) / std::min(window_size, seq.size())); | |
| profile.push_back(std::sqrt(sum_sin+sum_cos) / effective_window); |
| else { | ||
| throw Exception::InvalidValue(__FILE__, __LINE__, OPENMS_PRETTY_FUNCTION, "One letter code for this residue is empty", ""); | ||
| } | ||
| if (amino_acid < 65 || amino_acid > 90) |
There was a problem hiding this comment.
| if (amino_acid < 65 || amino_acid > 90) | |
| if (amino_acid < 'A' || amino_acid > 'Z') |
…enMS#9199) Document the cmake fix that accepts Arrow/Parquet 24.x and later by dropping the find_package version argument and using a VERSION_LESS guard (dfa9d82). Also document the pyarrow 24.0.0 exclusion in pyOpenMS dependencies. This entry was missing from the previous CHANGELOG sync. Co-authored-by: GitHub Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add two missing entries for changes merged after the last CHANGELOG sync (OpenMS#9193): - OpenMS#9196: CMake now accepts Arrow/Parquet 24.x and later; Ubuntu 24.04 CI runners ship Arrow 24.0.0 which broke builds with the previous find_package(Arrow 23) version constraint. pyarrow 24.0.0 excluded from pyOpenMS dependencies (cp310-only wheels on PyPI). - OpenMS#9174: Fix SpectraIDViewTab GUI compilation failure on macOS Apple Silicon (Tahoe); accession key lookup now converts to std::string after Qt string interop removal. Co-authored-by: GitHub Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… lists (OpenMS#9203) The previous sqrt(N) heuristic produces very coarse buckets on large immunopeptidomics indices (e.g. bucket_size 22k for a 490M-fragment non-specific human index). Each query peak then scans thousands of fragments far outside its tolerance window. Hard-coding bucket_size to 4096 approximates MSFragger's fixed ~0.02 Da fragment-bin density in the dense 500-1500 Da region, so a bucket now covers roughly one fragment-tolerance window instead of a wide sqrt(N) span. Benchmark on DN17_Liver_classI_techRep2 (SNES, 45M mothers, unspecific cleavage, 7 ppm / 20 ppm, 8 threads): - wall: 91 min -> 80.6 min (-11%) - CPU: 603 min -> 523 min (-13%) - PSMs: 28235 -> 28235 (identical) - peak RSS unchanged Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…PPExecutable (OpenMS#9204) ProSE invoked the Percolator rescoring step with the bare name "PercolatorAdapter". runExternalProcess_ hands that to boost::process, which only searches $PATH — in a dev build (or any install where the OpenMS bin/ directory is not on PATH) the adapter is not found and the rescoring silently falls back to HyperScore with just a WARN line: Standard error: Process 'PercolatorAdapter' failed to start. Does it exist? Is it executable? Percolator rescoring failed for <file>. Using original HyperScore results. Resolve the sibling binary via File::findSiblingTOPPExecutable before the per-file loop; on FileNotFound, skip rescoring once with a clear warning instead of retrying (and failing silently) per file. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
items 1a/1b) (OpenMS#9205) * ProSE: scrub dangling protein refs after merged decoy/FDR cleanup Fixes 'Invalid protein reference DECOY_...' crash when merged path runs applyPickedProteinFDR (deletes decoy ProteinHits) and target+decoy PSMs remain (kept by removeDecoyHits's exact-match check on target_decoy=='decoy'). Refs OpenMS#9197 item 1b. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ProSE: move -out_merged write after per-file outputs and wrap in try/catch Fixes data loss on merged-write failure: previously a merged-write exception aborted main_ and discarded per-file outputs even when per-file PSMs were already complete in memory. Per-file writes now complete first and the merged write is wrapped in try/catch so its failure becomes a logged error rather than a terminal exception. Refs OpenMS#9197 item 1a. * ProSE: soften merged-write catch message wording Don't claim per-file outputs were 'written successfully' — input_failed may be true for one or more files. Direct user to check per-file errors above. * docs(CHANGELOG): ProSE issue OpenMS#9197 items 1a + 1b --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…to functions, improved readability of documentation
…penMS#9207) std::filesystem::rename fails with EXDEV across filesystems, unlike Qt's QFile::rename which silently copied and removed. PR OpenMS#8938 replaced the Qt call without preserving that fallback, breaking TOPP adapters (CometAdapter -pin_out, MSGFPlusAdapter -mzid_out) that move files from a tmp dir to a bind-mounted output — a common container layout. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Reduced Memory Overhead by replacing map copy with map reference * Add myself to Authors * Updated CHANGELOG file * reduced RAM usage with rvalue ref * updated changelog & reorganized it so all TOPPTool changes are in one place * added PR number to changelog * applied recommended changes * Applied Coderabbit suggestions * applied suggested fixes * removed backwards compatibility --------- Co-authored-by: Tilman Aurich <tilman.aurich@fu-berlin.de> Co-authored-by: Chris Bielow <chris.bielow@fu-berlin.de>
Restore ProSE/FragmentIndex/File entries dropped by the FeatureFinderCentroided reorganization commit (OpenMS#9159) and add new entries for commits merged after the last sync: - OpenMS#9188: ProSE BREAKING: default isotope error range changed to [0,+2] - OpenMS#9191: ProSE SNES mother-peptide indexing for non-specific searches - OpenMS#9195: ProSE fix: defer PSM FDR when Percolator enabled; valid .pin output - OpenMS#9203: FragmentIndex: fixed bucket_size 4096 for tighter candidate lists - OpenMS#9204: ProSE: resolve PercolatorAdapter via findSiblingTOPPExecutable - OpenMS#9205: ProSE: per-file output isolation + dangling decoy ref crash fix - OpenMS#9207: File::rename: cross-device fallback (copy+remove on EXDEV) Co-authored-by: GitHub Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
AGENTS.md was updated in OpenMS#9194 to note that contrib is a git submodule and requires 'git submodule update --init contrib' before building. Sync the same note to CLAUDE.md so both AI-assistant guidelines files carry this important setup step. Co-authored-by: GitHub Copilot <copilot@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Description
Added hydrophobicity scale data by adding the function getHydrophobicity() to the Residue which returns the hydrophobocity value for an amino acid in a given scale.
Added the new class HydrophobicityProfile which is used for calculation hydrophobicity profiles for peptides.
It has functions for calculating GRAVY score, hydrophobicity profiles, windowed hydrophobicity profiles and hydrophobic moments of a peptide.
for Issue OpenMS#9005
Checklist
How can I get additional information on failed tests during CI
Click to expand
If your PR is failing you can check outIf you click in the column that lists the failed tests you will get detailed error messages.
Advanced commands (admins / reviewer only)
Click to expand
/reformat(experimental) applies the clang-format style changes as additional commit. Note: your branch must have a different name (e.g., yourrepo:feature/XYZ) than the receiving branch (e.g., OpenMS:develop). Otherwise, reformat fails to push.rebuild jenkinswill retrigger Jenkins-based CI builds