Harden CBS segmentation argument passing; remove the defunct fused-lasso method#1101
Merged
Conversation
The CBS and fused-lasso segmentation backends interpolated the probes file path and the sample ID directly into the R script source. A sample ID containing a single quote (e.g. O'Brien_001) or a path containing a backslash produced syntactically invalid R and crashed the Rscript subprocess. Read these values via commandArgs(trailingOnly=TRUE) instead, passing them as positional arguments to Rscript. core.call_quiet runs the subprocess without a shell, so the values require no escaping. This eliminates the string interpolation entirely (significance threshold and smooth-CBS flag included), along with the latent failure mode for any literal '%' appearing in the script. Robustness only: no change to file formats or numerical output. Adds regression tests covering sample IDs and probe paths that contain quotes, spaces, and backslashes. Surfaced by #1062.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1101 +/- ##
==========================================
+ Coverage 70.31% 70.35% +0.04%
==========================================
Files 74 73 -1
Lines 7897 7891 -6
Branches 1396 1395 -1
==========================================
- Hits 5553 5552 -1
+ Misses 1895 1891 -4
+ Partials 449 448 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
cghFLasso was archived from CRAN on 2018-06-17 and has been uninstallable ever since, so the 'flasso' segmentation method could not run. Remove it entirely: the flasso.py R template, the 'flasso' entry in SEGMENT_METHODS (which also drops it from the segment/batch --method choices), the method-specific dispatch and post-processing, and the stale references in the docstring, CLI help, and pipeline documentation. The parallelization guard previously special-cased flasso alongside the HMM methods; it now keys on the HMM methods alone, with a comment describing the actual reason (HMM fits a single model across all bins). Removing 'flasso' from the --method choices is a user-facing change, but the option could only ever raise "there is no package called 'cghFLasso'".
A CopyNumArray with no sample_id in its metadata produced the literal string 'None' as the segment ID with no indication anything was wrong. The CLI never reaches this -- read() derives sample_id from the input filename for both 'segment' and 'batch' -- but the in-memory API can construct an array without one. Emit a single warning from do_segmentation (before the per-arm fan-out, so it fires once per sample) and keep 'None' as the fallback ID, since no better label is available at that layer.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two related changes to the R-backed segmentation path.
1. Pass CBS parameters as command-line arguments
The CBS backend interpolated the probes file path and the sample ID directly into the R script source via
rscript % script_strings. A sample ID containing a single quote (e.g.O'Brien_001) or a path containing a backslash produced syntactically invalid R, crashing theRscriptsubprocess.This is a robustness bug, not a security vulnerability:
probes_fnameis a CNVkit-generated tempfile path andsample_idcomes from the user's own local input, so the interpolation was never an exploitable injection vector — only a parser-breakage one.The R template now reads its inputs from
commandArgs(trailingOnly=TRUE), and_do_segmentationpasses them as positional arguments toRscript.core.call_quietruns the subprocess without a shell, so the values traverseexecveargv untouched and require no escaping. This eliminates the string interpolation entirely (significance threshold and smooth-CBS flag included), removing the latent failure mode for any literal%appearing in the script as well.No change to file formats or numerical output.
2. Remove the defunct fused-lasso (cghFLasso) method
cghFLassowas archived from CRAN on 2018-06-17 and has been uninstallable since, so theflassosegmentation method could not run. It is removed entirely: theflasso.pyR template, theSEGMENT_METHODSentry (which also drops it from thesegment/batch--methodchoices), the method-specific dispatch and post-processing, and stale references in the docstring, CLI help, and pipeline documentation.Removing
flassofrom the--methodchoices is a user-facing change, but the option could only ever raisethere is no package called 'cghFLasso'.Tests
test_cbs_sample_id_with_quote—O'Brien_001exercised end-to-end throughdo_segmentation.test_cbs_probes_path_with_awkward_chars— probes path containing a quote, space, and backslash.Verified:
test_r.py,test_segmentation.py,test_commands.pyall pass; ruff, format, and mypy clean.Surfaced by #1062.