Skip to content

Harden CBS segmentation argument passing; remove the defunct fused-lasso method#1101

Merged
etal merged 3 commits into
masterfrom
fix-rscript-segmentation-arg-quoting
May 29, 2026
Merged

Harden CBS segmentation argument passing; remove the defunct fused-lasso method#1101
etal merged 3 commits into
masterfrom
fix-rscript-segmentation-arg-quoting

Conversation

@etal

@etal etal commented May 29, 2026

Copy link
Copy Markdown
Owner

Two related changes to the R-backed segmentation path.

1. Pass CBS parameters as command-line arguments

The CBS backend interpolated the probes file path and the sample ID directly into the R script source via rscript % script_strings. A sample ID containing a single quote (e.g. O'Brien_001) or a path containing a backslash produced syntactically invalid R, crashing the Rscript subprocess.

This is a robustness bug, not a security vulnerability: probes_fname is a CNVkit-generated tempfile path and sample_id comes from the user's own local input, so the interpolation was never an exploitable injection vector — only a parser-breakage one.

The R template now reads its inputs from commandArgs(trailingOnly=TRUE), and _do_segmentation passes them as positional arguments to Rscript. core.call_quiet runs the subprocess without a shell, so the values traverse execve argv untouched and require no escaping. This eliminates the string interpolation entirely (significance threshold and smooth-CBS flag included), removing the latent failure mode for any literal % appearing in the script as well.

No change to file formats or numerical output.

2. Remove the defunct fused-lasso (cghFLasso) method

cghFLasso was archived from CRAN on 2018-06-17 and has been uninstallable since, so the flasso segmentation method could not run. It is removed entirely: the flasso.py R template, the SEGMENT_METHODS entry (which also drops it from the segment/batch --method choices), the method-specific dispatch and post-processing, and stale references in the docstring, CLI help, and pipeline documentation.

Removing flasso from the --method choices is a user-facing change, but the option could only ever raise there is no package called 'cghFLasso'.

Tests

  • Updated the existing Runtime Error Using CNVKit Batch #868 regression test to the new invocation.
  • Added test_cbs_sample_id_with_quoteO'Brien_001 exercised end-to-end through do_segmentation.
  • Added test_cbs_probes_path_with_awkward_chars — probes path containing a quote, space, and backslash.

Verified: test_r.py, test_segmentation.py, test_commands.py all pass; ruff, format, and mypy clean.

Surfaced by #1062.

The CBS and fused-lasso segmentation backends interpolated the probes
file path and the sample ID directly into the R script source. A sample
ID containing a single quote (e.g. O'Brien_001) or a path containing a
backslash produced syntactically invalid R and crashed the Rscript
subprocess.

Read these values via commandArgs(trailingOnly=TRUE) instead, passing
them as positional arguments to Rscript. core.call_quiet runs the
subprocess without a shell, so the values require no escaping. This
eliminates the string interpolation entirely (significance threshold and
smooth-CBS flag included), along with the latent failure mode for any
literal '%' appearing in the script.

Robustness only: no change to file formats or numerical output. Adds
regression tests covering sample IDs and probe paths that contain
quotes, spaces, and backslashes.

Surfaced by #1062.
@codecov

codecov Bot commented May 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.35%. Comparing base (ab40f4d) to head (1fa3be0).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1101      +/-   ##
==========================================
+ Coverage   70.31%   70.35%   +0.04%     
==========================================
  Files          74       73       -1     
  Lines        7897     7891       -6     
  Branches     1396     1395       -1     
==========================================
- Hits         5553     5552       -1     
+ Misses       1895     1891       -4     
+ Partials      449      448       -1     
Flag Coverage Δ
unittests 70.35% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cghFLasso was archived from CRAN on 2018-06-17 and has been uninstallable
ever since, so the 'flasso' segmentation method could not run. Remove it
entirely: the flasso.py R template, the 'flasso' entry in SEGMENT_METHODS
(which also drops it from the segment/batch --method choices), the
method-specific dispatch and post-processing, and the stale references in
the docstring, CLI help, and pipeline documentation.

The parallelization guard previously special-cased flasso alongside the
HMM methods; it now keys on the HMM methods alone, with a comment
describing the actual reason (HMM fits a single model across all bins).

Removing 'flasso' from the --method choices is a user-facing change, but
the option could only ever raise "there is no package called 'cghFLasso'".
@etal etal changed the title Pass R segmentation parameters as command-line arguments Harden CBS segmentation argument passing; remove the defunct fused-lasso method May 29, 2026
A CopyNumArray with no sample_id in its metadata produced the literal
string 'None' as the segment ID with no indication anything was wrong.
The CLI never reaches this -- read() derives sample_id from the input
filename for both 'segment' and 'batch' -- but the in-memory API can
construct an array without one.

Emit a single warning from do_segmentation (before the per-arm fan-out,
so it fires once per sample) and keep 'None' as the fallback ID, since no
better label is available at that layer.
@etal etal merged commit 9337611 into master May 29, 2026
13 checks passed
@etal etal deleted the fix-rscript-segmentation-arg-quoting branch May 29, 2026 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant