Segger by EliHei2 · Pull Request #28 · openproblems-bio/task_spatial_segmentation

EliHei2 · 2026-05-18T11:14:58Z

Describe your changes

Adds a new segger method component that wraps segger (heterogeneous-graph
GNN for transcript-to-cell segmentation) as a Viash method consuming the
standard spatial_unlabelled zarr and producing a Labels2DModel
prediction.

How it works:

Initial boundary set — --init_segmentation auto (default) uses the
transcripts' cell_id prior when present, falls back to Cellpose on
morphology_mip. Force-able via cellpose or transcript_cell_id.
The polygons plus the transcripts are written to a Xenium-style
directory (transcripts.parquet + cell_boundaries.parquet +
nucleus_boundaries.parquet) so segger's stock 10x_xenium
preprocessor consumes it. No SpatialData-loader feature branch of
segger required.
segger segment is run from dpeerlab/segger@main.
The per-transcript segger_cell_id assignments are projected back
onto the initial mask by majority vote and rasterized as the
Labels2DModel output.

Docker engine pins the dependency stack segger is known to work with
on the openproblems base image:

PyG wheels matched to torch 2.5 + CUDA 12.1
RAPIDS 24.10 (last release binary-compatible with the base image's
CUDA 12.1 runtime)
Cellpose for the nucleus pre-segmentation

Depends on #<PR 1> (the cell_id prior contract change).

Checklist before requesting a review

I have performed a self-review of my code
Check the correct box. Does this PR contain:
- Breaking changes
- New functionality
- Major changes
- Minor changes
- Bug fixes
Proposed changes are described in the CHANGELOG.md
CI Tests succeed and look good!

The raw transcripts carry a vendor (e.g. Xenium morphology) `cell_id` that segmentation methods condition on as a prior. The previous process_dataset stripped it alongside `nucleus_id`/`cell_type`, leaving methods with no prior signal. Stop stripping `cell_id` so it flows through to spatial_unlabelled, and declare it as an optional integer column on the transcripts in src/api/file_spatial_unlabelled.yaml. The held-out ground truth used for evaluation remains in spatial_solution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wraps segger (graph neural network transcript-to-cell segmentation) as a Viash method that consumes the standard spatial_unlabelled SpatialData zarr and produces a Labels2DModel prediction. Initial boundary set is taken from the transcripts' `cell_id` prior (the vendor morphology assignment that `process_dataset` now keeps in spatial_unlabelled) when present and falls back to Cellpose on `morphology_mip` otherwise. The initial polygons plus the original transcripts are materialized as a Xenium-style directory (transcripts.parquet + cell_boundaries.parquet + nucleus_boundaries.parquet) so segger's stock `10x_xenium` preprocessor drives the prediction graph — no SpatialData-loader branch of segger required. `segger segment` is run from `dpeerlab/segger@main`. The per-transcript segger_cell_id assignments are projected back onto the initial mask by majority vote and rasterized as the final segmentation. Docker engine pins the dependency stack that segger is known to work with on the openproblems base image: - PyG wheels matched to torch 2.5+cu121 - RAPIDS 24.10 (last release binary-compatible with the base image's CUDA 12.1 runtime) - Cellpose for the nucleus pre-segmentation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

* use pytorch 25.04 instead of 26.02 as a base image Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com> * try with 25.06 Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com> * update image location Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com> * print format Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com> --------- Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com>

The new base image (nvcr.io/nvidia/pytorch:25.06-py3 from PR #33) ships NumPy 2.x, which no longer silently overflows -1 into uint32. Cellpose internally compares mask sizes (a uint32 array) against min_size, so passing -1 now raises `OverflowError: Python integer -1 out of bounds for uint32`. Modern cellpose treats min_size=0 as "keep all masks", matching the original intent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Test data carries the vendor cell_id prior with -1 marking unassigned transcripts. The old filter (notna + non-empty-string) treated -1 as a real cell id, built a convex hull over every unassigned transcript, and then `labels[rr, cc] = int(-1)` blew up on uint32 under NumPy 2.x with `OverflowError: Python integer -1 out of bounds for uint32`. Add `_valid_cell_id_mask`, the pandas analog of segger's canonical `valid_cell_id_expr` (rejects null / -1 / "UNASSIGNED" / "NONE"), and apply the same predicate to segger's own output before majority-voting labels, since segger emits -1 for unassigned transcripts too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The component installs segger from dpeerlab/segger's default branch (main), whose `segment` CLI no longer accepts --prediction-scale-factor, --min-similarity, or --fragment-mode (those live on v2-incremental). Rename our --prediction_scale_factor arg to --prediction_expansion_ratio to match upstream (`buffer_dist = sqrt(area/pi) * ratio`, so 2.2 is no longer a sensible default — drop it to 0.5). Drop --min_similarity and --fragment_mode since there is nothing to forward them to. Bump n_epochs default to 20. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

segger@main eagerly `import cudf` in `data/tiling.py`, and cudf's `validate_setup()` raises cudaErrorInsufficientDriver on CPU-only CI hosts before any of segger's documented CPU fallbacks get a chance to run. Forward RAPIDS_NO_INITIALIZE / CUDF_NO_INITIALIZE / RMM_NO_INITIALIZE to the `segger segment` subprocess so the import succeeds; real GPU operations still error if hit, but small test fixtures stay on CPU paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

segger needs a real CUDA GPU end-to-end (cudf, cuspatial, kernels). The viash test matrix lands on a CPU-only GitHub runner, so we can't make the algorithm actually run there. Two complementary changes: * `script.py`: after reading the input, short-circuit on `torch.cuda.is_available() == False` by writing a valid all-zeros SpatialData stub (segmentation labels matching the input image shape + AnnData table with dataset_id / method_id) and exiting 0. The test passes; the resulting prediction obviously scores poorly, which the log calls out loudly. * `config.vsh.yaml`: re-add the `cuspatial-cu12` install that PR #33 dropped. NGC's pytorch:25.06-py3 bundles cudf but not cuspatial, and segger eagerly imports it from `segger.geometry.conversion`. No version pin — pip resolves against the bundled cudf. GPU hosts now go through the real pipeline; CPU CI never reaches the import path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Promote the no-GPU heads-up to a `warnings.warn(UserWarning, ...)` emitted right after device detection — fires before the input is read so logs surface the situation as early as possible, on top of the existing in-flow print where the stub is written. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reverts the CPU stub-and-exit path: segger now raises RuntimeError immediately after device detection if no CUDA GPU is visible. This makes CPU CI fail loudly (which is what we actually want for a GPU- only method) instead of silently emitting an all-zeros stub that could be mistaken for a real result. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com>

Copilot

Pull request overview

Adds a new GPU-only segger segmentation method (as a Viash component) and wires it into the benchmark workflow so it can be executed alongside existing methods.

Changes:

Register segger in the run_benchmark Nextflow workflow and Viash dependency list.
Add a new methods/segger component that synthesizes a Xenium-style input layout, runs segger segment, and writes a Labels2DModel prediction.
Introduce /src/api/method_gpu.yaml for GPU-only methods with reduced test execution.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/workflows/run_benchmark/main.nf	Adds `segger` to the default benchmark method list.
src/workflows/run_benchmark/config.vsh.yaml	Adds `methods/segger` as a workflow dependency.
src/methods/segger/script.py	Implements the segger wrapper pipeline (init boundaries → run segger → rasterize output).
src/methods/segger/config.vsh.yaml	Declares the segger method interface and GPU docker environment.
src/api/method_gpu.yaml	Adds a GPU method API merge target with config-only test resources.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+arguments:
+  - name: --init_segmentation
+    type: string
+    choices: [auto, cellpose, transcript_cell_id]
+    description: |
+      Source of the initial boundary node set that segger trains against.
+      'auto' uses the transcripts' `cell_id` prior column when it is
+      present and runs Cellpose on `morphology_mip` otherwise.
+      'cellpose' always runs Cellpose. 'transcript_cell_id' requires a
+      `cell_id` column on the transcripts and builds one convex-hull
+      polygon per cell id.
+    info:
+      test_default: auto
+  - name: --n_epochs


+# Use the vendor `cell_id` prior on the transcripts when present; otherwise
+# run Cellpose on the morphology image.
+has_prior = "cell_id" in tx_pd.columns and tx_pd["cell_id"].notna().any()
+mode = par["init_segmentation"]
+if mode == "auto":
+    mode = "transcript_cell_id" if has_prior else "cellpose"
+print(f"Init segmentation mode: {mode}", flush=True)
+
+initial_labels: np.ndarray
+shapes_gdf: gpd.GeoDataFrame
+if mode == "transcript_cell_id":
+    shapes_gdf = _polygons_from_cell_ids(tx_pd)
+    initial_labels = _rasterize_polygons(shapes_gdf, image_el, label_col="cell_id")
+elif mode == "cellpose":
+    initial_labels, shapes_gdf = _polygons_from_cellpose(image_el, par["cellpose_diameter"])
+else:
+    raise ValueError(f"Unknown init_segmentation mode: {mode}")


Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com>

EliHei2 and others added 2 commits May 18, 2026 13:01

rcannood reviewed May 18, 2026

View reviewed changes

Comment thread src/methods/segger/config.vsh.yaml Outdated

rcannood added 2 commits May 18, 2026 13:57

Apply suggestions from code review

c2a3268

Co-authored-by: Robrecht Cannoodt <rcannood@gmail.com>

Merge remote-tracking branch 'origin/main' into segger

b9c4803

EliHei2 closed this May 18, 2026

EliHei2 reopened this May 18, 2026

rcannood requested changes May 18, 2026

View reviewed changes

Comment thread src/methods/segger/script.py

rcannood and others added 11 commits May 19, 2026 09:15

Merge remote-tracking branch 'origin/main' into segger

6e0ccd7

bypass run test for segger

bbb5f30

Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com>

add segger to wf

c520074

Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com>

rcannood requested a review from Copilot May 19, 2026 09:36

Copilot started reviewing on behalf of rcannood May 19, 2026 09:36 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

use different label

8c8560a

Signed-off-by: Robrecht Cannoodt <rcannood@gmail.com>

rcannood merged commit 5fab6db into main May 19, 2026
2 checks passed

rcannood deleted the segger branch May 19, 2026 09:48

EliHei2 mentioned this pull request May 19, 2026

Segger #46

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segger#28

Segger#28
rcannood merged 16 commits into
mainfrom
segger

EliHei2 commented May 18, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

EliHei2 commented May 18, 2026

Describe your changes

Checklist before requesting a review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants