Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,21 @@ All notable changes to wsi-tools will be documented here. The format is loosely

## [Unreleased]

### Fixed

- **Corrupt edge frames in `convert --to dicom --factor` / `downsample` / `crop`
to DICOM (the retile-engine DICOM path).** Partial right/bottom edge frames —
and whole levels smaller than one frame — were encoded at their truncated
content size, but DICOM TILED_FULL requires every frame to be exactly
`Rows`×`Columns`; OpenSlide's DICOM reader (and other strict consumers) rejected
them (`Dimensional mismatch reading JPEG, expected 256x256, got …`). This is the
same class as the v0.24.1 TIFF edge-tile fix, which only covered the TIFF/IFE
encoder; `dicomFrameEncoder` now edge-replicates partial frames up to the full
frame size as well. The verbatim DICOM-source frame-copy path was never
affected. Added a cross-tool manual QA harness under `scripts/qa/` (matrix
generator + OpenSlide/Bio-Formats auto-validators + viewer checklist) that
surfaced this.

## [0.24.1] - 2026-06-28

### Fixed
Expand Down
17 changes: 15 additions & 2 deletions cmd/wsitools/dicom_engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ func runDICOMEngine(ctx context.Context, slide *opentile.Slide, srcRegion openti
if err != nil {
return err
}
// Pad partial edge frames up to the full frame size (DICOM TILED_FULL frames
// are uniform Rows×Columns); all levels share one square tile size.
enc.tileW, enc.tileH = levels[0].TileW, levels[0].TileH
defer enc.Close()

spoolDir, err := os.MkdirTemp("", "wsitools-dcm-spool-*")
Expand Down Expand Up @@ -162,8 +165,9 @@ func (l *spoolLevel) DecodedTile(x, y int) (*decoder.Image, error) {
// EncodeStandalone; J2K-family codecs (jpeg2000/htj2k) already return a complete
// codestream from EncodeTile.
type dicomFrameEncoder struct {
jpeg *jpegcodec.Encoder // non-nil for jpeg
codec codec.Encoder // non-nil for j2k-family
jpeg *jpegcodec.Encoder // non-nil for jpeg
codec codec.Encoder // non-nil for j2k-family
tileW, tileH int // full frame size; partial edge tiles are padded up to it
}

// newDicomFrameEncoder builds the frame encoder + reports the source.Compression
Expand Down Expand Up @@ -202,6 +206,15 @@ func newJ2KFrameEncoder(codecName string, quality int) (*dicomFrameEncoder, sour
}

func (e *dicomFrameEncoder) EncodeTile(rgb []byte, w, h int) ([]byte, error) {
// DICOM TILED_FULL requires every frame to be exactly Rows×Columns; the engine
// hands partial right/bottom edge frames (and whole levels smaller than one
// frame) at truncated content size. Pad up to the full frame (edge-replicated)
// so strict readers (OpenSlide's DICOM reader, pydicom consumers) don't hit a
// frame/dimension mismatch — mirrors codecTileEncoder for the TIFF family.
if e.tileW > 0 && e.tileH > 0 && (w < e.tileW || h < e.tileH) {
rgb = padRGBTileReplicate(rgb, w, h, e.tileW, e.tileH)
w, h = e.tileW, e.tileH
}
if e.jpeg != nil {
return e.jpeg.EncodeStandalone(rgb, w, h)
}
Expand Down
130 changes: 130 additions & 0 deletions scripts/qa/MANUAL-TEST-PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# wsitools manual QA plan

A checklist for **manually** exercising wsitools across formats/transforms and
confirming the outputs in real viewers. The helper scripts here generate a broad
matrix of outputs and auto-validate the ones that can be machine-checked; the
rest you open by eye in the viewers you have.

This is deliberately *not* a programmatic test suite (those live in `go test`).
It's the "did we actually break anything a real viewer cares about" pass to run
before a release.

## 0. Workflow

```sh
# 1. Generate the output matrix (builds wsitools from the repo).
scripts/qa/run-matrix.sh # add --big to also drive NDPI + IFE/Iris sources
# -> /tmp/wsitools-qa/cases/* + manifest.tsv (override with OUT=/path)

# 2. Auto-validate the machine-checkable outputs.
scripts/qa/check-openslide.sh # OpenSlide = Aperio-ecosystem gold oracle
scripts/qa/check-bioformats.sh # Bio-Formats = what QuPath uses underneath
scripts/qa/check-bioformats.sh --pixels # also decode a 256x256 crop per output

# 3. Eyeball the rendered PNGs from the OpenSlide pass.
open /tmp/wsitools-qa/openslide/*.png

# 4. Hand-open the GUI-only artifacts (QuPath / ImageScope / Hamamatsu / browser)
# per the tables below.
```

`manifest.tsv` columns: `id category description source output status`.
Every row's `output` is under `OUT/cases/`.

## 1. What to look for (the rubric)

These are the failure modes wsitools has actually hit. Check each opened slide
against them:

| # | Symptom | What it means |
|---|---------|---------------|
| R1 | **Colours wrong** (blue/orange swapped, oversaturated) | Photometric/Subsampling tag vs JPEG framing mismatch |
| R2 | **Black blocks / garbled stripes at right or bottom edges** | Edge tiles not padded to full tile size (TIFF/DICOM) |
| R3 | **A pyramid/zoom level missing** (big jump in zoom detail) | SVS thumbnail not at IFD 1 → ImageScope ate a level |
| R4 | **Wrong physical scale** (scale bar / µm wrong) | MPP / magnification metadata dropped or mis-scaled |
| R5 | **Label / macro / overview missing or wrong** | Associated-image copy/edit defect |
| R6 | **Won't open at all** | Structural/container conformance defect |
| R7 | **Truncated tissue / wrong dimensions** | Level dims or crop rect handling defect |

`info` + `validate` (run automatically in matrix section A) cover R4/R5/R6 at the
metadata level; the viewers confirm the pixels.

## 2. Auto-validated (no GUI needed)

| Tool | Reads | Catches | Run |
|------|-------|---------|-----|
| **OpenSlide** | svs, generic tiled tiff, cog-wsi, dicom, ndpi, bif, mrxs, scn, philips | R2 (dimensional mismatch on render), R3 (level count), R5 (associated list), R6 | `check-openslide.sh` |
| **Bio-Formats** | the above + ome-tiff (+ proxy for QuPath) | R6 (parse/IFD/OME-XML errors), R7 (series dims), pixel decode with `--pixels` | `check-bioformats.sh` |

Both print `OK / FAIL / N/A`. `N/A` = that tool can't read that container/codec
(expected — see §4); only `FAIL` needs attention. The OpenSlide pass also writes
a deepest-level PNG per slide to `OUT/openslide/` — flip through them for R1/R2/R7.

## 3. Manual viewers (open by hand)

### ImageScope (Windows — strict Aperio reader; the toughest critic)
Open these from `OUT/cases/` and check R1/R2/R3/R4/R5:

| Artifact | Why it matters |
|----------|----------------|
| `b_svs.svs` | baseline SVS round-trip |
| `e_bif2svs.svs`, `e_ome2svs.svs`, `e_cog2svs.svs`, `e_dicom2svs.svs` | cross-format → SVS (R1/R2/R3 regression set) |
| `e_ife2svs.svs` (needs `--big`) | IFE/Iris → SVS (the 4×-pyramid case) |
| `d_factor2.svs`, `d_rect.svs`, `d_tile512.svs`, `d_crop.svs`, `d_crop_lossless.svs` | transforms — check edges (R2) + scale (R4) |
| `g_label_replaced.svs`, `g_label_removed.svs`, `g_overview_removed.svs`, `g_macro_replaced.svs` | associated edits (R5); open Image → "View Label / View Thumbnail" |

In the ImageScope **Image Information** panel verify: all pyramid levels present
with sensible ratios (R3), MPP + AppMag correct (R4), Label/Thumbnail tabs
populated (R5).

### QuPath (cross-platform — Bio-Formats + OpenSlide)
Open the same SVS set plus `b_ome.ome.tiff`, `b_tiff.tiff`, `b_cog.tiff`,
`b_dicom/` (point at a `.dcm`). Check R1/R2/R4/R7 at multiple zooms; QuPath's
status bar shows µm/px (R4). OME-TIFF is QuPath's strong suit — confirm
`b_ome.ome.tiff` and `e_ome2svs.svs` look right.

### Hamamatsu viewer (NDPI)
Hamamatsu's viewer is for native NDPI. Use it on the **source** `ndpi/*.ndpi`
fixtures to confirm the source reads (sanity), and on any NDPI you produce. (wsitools
does not write NDPI, so this is mostly source-side / read-side confirmation.)

### Browser / OpenSeadragon (DZI, SZI)
OpenSlide/Bio-Formats can't read DZI/SZI. Validate them as tiled web pyramids:
- `b_dzi.dzi` + `b_dzi_files/` — load in any DZI viewer (OpenSeadragon demo page,
or VIPS `vipsdisp`). Check R2 (tile seams/edges), R7 (full extent), and that
deep zoom levels all load.
- `b_szi.szi` — the zipped DZI; unzip and inspect, or use an SZI-aware viewer.

### Iris validator (IFE / `.iris`)
The official gold gate for IFE. In a venv: `pip install Iris-Codec`, then
`make ife-validate` (or the snippet in the Makefile). Validate `b_ife.iris` and,
with `--big`, the round-trip of `425248_JPEG.iris`. CI also runs this.

## 4. Expected `N/A` / known gaps (NOT failures)

- **Novel codecs in TIFF** (`c_avif.tiff`, `c_htj2k.tiff`, `c_jpegxl.tiff`,
`c_webp.tiff`): no standard TIFF compression tag → OpenSlide & Bio-Formats
can't read them. They are **wsitools/opentile-only**; validate with
`wsitools info <f>` / `wsitools validate <f>` / `wsitools region`. (JPEG and
JPEG-2000 in TIFF/SVS *are* standard and read everywhere.)
- **DZI / SZI / IFE**: not readable by OpenSlide or Bio-Formats (see §3 for their
validators).
- **`b_bif.bif` in OpenSlide**: OpenSlide's Ventana reader rejects our synthesized
`TileJointInfo Direction="LEFT"/"UP"` ("Bad direction attribute"). Our BIF is
read correctly by **Bio-Formats / QuPath / opentile** (and round-trips
pixel-identical); `--to bif` is experimental. Known interop gap with OpenSlide's
BIF reader specifically — not a general defect.

## 5. Pixel-equivalence spot checks (optional, exact)

For conversions that should be pixel-faithful, compare against the source with
wsitools' own pixel hash (geometry-independent within a level):

```sh
# Lossless / tile-copy conversions should match the source pixel hash:
wsitools hash --mode pixel sample_files/svs/CMU-1.svs
wsitools hash --mode pixel /tmp/wsitools-qa/cases/d_crop_lossless.svs # same region
# Render the same region from source and output and diff visually:
wsitools region --level 0 --rect 2000,2000,1024,1024 -o /tmp/src.png sample_files/svs/CMU-1.svs
wsitools region --level 0 --rect 1000,1000,1024,1024 -o /tmp/out.png /tmp/wsitools-qa/cases/d_crop.svs
```
71 changes: 71 additions & 0 deletions scripts/qa/check-bioformats.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
#!/usr/bin/env bash
#
# check-bioformats.sh — auto-validate run-matrix.sh outputs with Bio-Formats
# (showinf). Bio-Formats is what QuPath uses under the hood, so a clean parse
# here is a good predictor of QuPath behaviour. For each readable output it:
# - parses metadata only (showinf -nopix) — catches structural/IFD/OME-XML errors,
# - reports series count + dimensions,
# - optionally decodes a small crop (bfconvert) to confirm pixels read.
# Bio-Formats can't read DZI/SZI/IFE — those are reported N/A.
#
# Usage: scripts/qa/check-bioformats.sh [--pixels] (OUT defaults to /tmp/wsitools-qa)
# --pixels also bfconvert a 256x256 crop of series 0 (slower; confirms decode)
#
set -uo pipefail
OUT="${OUT:-/tmp/wsitools-qa}"
CASES="$OUT/cases"
DEST="$OUT/bioformats"
PIX=0; [[ "${1:-}" == "--pixels" ]] && PIX=1
mkdir -p "$DEST"

command -v showinf >/dev/null 2>&1 || { echo "Bio-Formats 'showinf' not found (install bftools)"; exit 1; }
[[ -d "$CASES" ]] || { echo "no cases dir at $CASES — run run-matrix.sh first"; exit 1; }

pass=0; fail=0; na=0
printf "%-28s %-8s %s\n" "OUTPUT" "RESULT" "DETAIL"
printf -- "---------------------------------------------------------------------------\n"

for path in "$CASES"/*; do
name="$(basename "$path")"
case "$name" in
*.dzi|*.szi|*.iris) printf "%-28s %-8s %s\n" "$name" "N/A" "Bio-Formats can't read this container"; na=$((na+1)); continue ;;
esac
# DICOM output is a directory of .dcm — point Bio-Formats at one instance.
target="$path"
if [[ -d "$path" ]]; then
target="$(find "$path" -maxdepth 1 -name '*.dcm' | head -1)"
[[ -z "$target" ]] && { printf "%-28s %-8s %s\n" "$name" "N/A" "no .dcm in dir"; na=$((na+1)); continue; }
fi

log="$DEST/$name.showinf.log"
showinf -nopix -no-upgrade "$target" >"$log" 2>&1
rc=$?
# Novel codecs (AVIF/HTJ2K/JPEG-XL/WebP in TIFF) have no Bio-Formats codec —
# that's an expected reader limitation (same as OpenSlide N/A), not a defect.
if grep -qiE "Unable to find TiffCompres|unsupported compression" "$log"; then
printf "%-28s %-8s %s\n" "$name" "N/A" "Bio-Formats has no codec for this tile compression"
na=$((na+1)); continue
fi
if [[ $rc -ne 0 ]] || grep -qiE "exception|cannot read|error reading|unsupported" "$log"; then
printf "%-28s %-8s %s\n" "$name" "PARSEFAIL" "$(grep -iE 'exception|cannot|error|unsupported' "$log" | head -1)"
fail=$((fail+1)); continue
fi
series="$(sed -n 's/^Series count = \([0-9]*\)/\1/p' "$log" | head -1)"
dims="$(grep -m1 'Width = ' "$log" | sed 's/.*Width = //')x$(grep -m1 'Height = ' "$log" | sed 's/.*Height = //')"

detail="series=${series:-?} dim0=${dims}"
result="OK"
if [[ "$PIX" == 1 ]]; then
if bfconvert -overwrite -series 0 -crop 0,0,256,256 "$target" "$DEST/$name.crop.png" >"$DEST/$name.bfconvert.log" 2>&1; then
detail="$detail pixels=OK"
else
detail="$detail pixels=FAIL"; result="PIXFAIL"; fail=$((fail+1))
fi
fi
[[ "$result" == "OK" ]] && pass=$((pass+1))
printf "%-28s %-8s %s\n" "$name" "$result" "$detail"
done

echo
echo "Bio-Formats: $pass OK, $fail FAIL, $na N/A. showinf logs in $DEST"
[[ "$fail" -gt 0 ]] && exit 1 || exit 0
78 changes: 78 additions & 0 deletions scripts/qa/check-openslide.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
#!/usr/bin/env bash
#
# check-openslide.sh — auto-validate the run-matrix.sh outputs with the OpenSlide
# CLI (the Aperio-ecosystem gold oracle). For every output OpenSlide can open it:
# - prints level count + per-level downsamples (catches dropped/duplicated levels),
# - renders the deepest pyramid level to a PNG (catches the "Dimensional mismatch
# reading JPEG" edge-tile bug and other read failures),
# - reports associated images (label/macro/thumbnail).
#
# OpenSlide can't read some containers/codecs (DZI/SZI/IFE/OME-TIFF, and novel
# codecs like JPEG-XL/AVIF/WebP, or JPEG2000 in a *generic* TIFF). Those are
# reported N/A — use Bio-Formats / the Iris validator / `wsitools validate` for
# them. A real FAIL is a container OpenSlide *should* read (e.g. a JPEG SVS) that
# errors on open or render.
#
# Usage: scripts/qa/check-openslide.sh (OUT defaults to /tmp/wsitools-qa)
#
set -uo pipefail
OUT="${OUT:-/tmp/wsitools-qa}"
CASES="$OUT/cases"
DEST="$OUT/openslide"
mkdir -p "$DEST"

command -v openslide-show-properties >/dev/null 2>&1 || { echo "openslide CLI not found (brew install openslide)"; exit 1; }
[[ -d "$CASES" ]] || { echo "no cases dir at $CASES — run run-matrix.sh first"; exit 1; }

# prop FILE KEY — extract a property value (KEY may contain [] brackets).
prop() { openslide-show-properties "$1" 2>/dev/null | grep -F "$2: '" | head -1 | sed "s/.*: '\(.*\)'\$/\1/"; }
# is the open-error an expected OpenSlide limitation (not a wsitools defect)?
expected_na() { grep -qiE "unsupported tiff compression|compression support is not configured|not a file that openslide can recognize|unsupported|cannot read" "$1"; }

pass=0; fail=0; na=0
printf "%-28s %-9s %s\n" "OUTPUT" "RESULT" "DETAIL"
printf -- "---------------------------------------------------------------------------\n"

for path in "$CASES"/*; do
name="$(basename "$path")"
case "$name" in
*.dzi|*.szi|*.iris|*.ome.tiff) printf "%-28s %-9s %s\n" "$name" "N/A" "OpenSlide can't read this container"; na=$((na+1)); continue ;;
esac

# DICOM output is a directory of .dcm — OpenSlide opens it via one instance.
target="$path"
[[ -d "$path" ]] && target="$(find "$path" -maxdepth 1 -name '*.dcm' | sort | head -1)"
[[ -z "$target" ]] && { printf "%-28s %-9s %s\n" "$name" "N/A" "no .dcm in dir"; na=$((na+1)); continue; }

if ! openslide-show-properties "$target" >/dev/null 2>"$DEST/$name.openerr"; then
if expected_na "$DEST/$name.openerr"; then
printf "%-28s %-9s %s\n" "$name" "N/A" "$(head -1 "$DEST/$name.openerr" | sed 's#.*: ##')"
na=$((na+1))
else
printf "%-28s %-9s %s\n" "$name" "OPENFAIL" "$(head -1 "$DEST/$name.openerr")"
fail=$((fail+1))
fi
continue
fi

lc="$(prop "$target" openslide.level-count)"; lc="${lc:-1}"
last=$((lc-1))
lw="$(prop "$target" "openslide.level[$last].width")"
lh="$(prop "$target" "openslide.level[$last].height")"
downs="$(openslide-show-properties "$target" 2>/dev/null | sed -n "s/^openslide.level\[[0-9]*\].downsample: '\(.*\)'\$/\1/p" | awk '{printf "%.0f ",$1}')"
assoc="$(openslide-show-properties "$target" 2>/dev/null | sed -n "s/^openslide.associated.\([a-z]*\)\..*/\1/p" | sort -u | tr '\n' ',' | sed 's/,$//')"

rerr="$DEST/$name.readerr"
if [[ -n "$lw" && -n "$lh" ]] && openslide-write-png "$target" 0 0 "$last" "$lw" "$lh" "$DEST/$name.png" 2>"$rerr"; then
printf "%-28s %-9s L=%s ds=[%s] assoc=[%s]\n" "$name" "OK" "$lc" "${downs% }" "${assoc:-none}"
pass=$((pass+1))
else
printf "%-28s %-9s %s\n" "$name" "READFAIL" "$(head -1 "$rerr" 2>/dev/null) (level $last ${lw}x${lh})"
fail=$((fail+1))
fi
done

echo
echo "OpenSlide: $pass OK, $fail FAIL, $na N/A. Rendered PNGs + error logs in $DEST"
echo "Eyeball $DEST/*.png: colours correct, no black/garbled edges, full tissue extent."
[[ "$fail" -gt 0 ]] && exit 1 || exit 0
Loading
Loading