Cross-fit clinicalWinRatio; characterize & document small-sample behavior by blind-contours · Pull Request #22 · blind-contours/concrete

blind-contours · 2026-06-09T01:48:25Z

What

Two related improvements to the experimental clinicalWinRatio():

Cross-fitting (n.folds, default 5). The transition and censoring hazards
are now fit out-of-fold, so each subject's influence-function contribution uses
nuisances trained without them. This removes the empirical-process (over-fitting)
term and gives honest inference when SL.library contains flexible learners
(random forests, HAL, penalized regression). n.folds = 1 keeps the faster
in-sample fits for simple parametric learners. Verified the estimate is
unchanged at large n (WR 1.590 vs brute-force truth 1.585).
Small-sample behavior — characterized and documented honestly. The win
ratio is a ratio, so it is mildly biased and anti-conservative at small n.
This is a well-known finite-sample property of the win ratio (it affects the
unadjusted Pocock win ratio too), not a defect of this estimator or its
nuisances. A null simulation (both arms identical, true WR = 1) gives:

n / arm mean WR 95% coverage type-I

400 0.99 0.93–0.94 0.06–0.07

800 1.00 0.97 0.03

1600 1.00 0.97 0.03

Bias ~1% at 400/arm, gone by 800; coverage nominal by ~800/arm.

Crucially, cross-fitting does not change this — confirmed empirically
(cross-fitted vs in-sample at n=400 gave identical coverage). It is a property
of the win-ratio functional, not the empirical-process term, so the methodological
reflex of "under-coverage → cross-fit" does not apply here.

Docs

Function help: new @section Small-sample behavior; corrected @param n.folds
(it no longer claims to fix the small-n issue).
"Win ratios for trialists" vignette: new A note on small trials subsection +
coverage-vs-n figure.
NEWS.md updated.
scripts/make-clinical-wr-smalln.R reproduces the sweep.

🤖 Generated with Claude Code

Add V-fold cross-fitting (n.folds, default 5) to clinicalWinRatio(): the transition and censoring hazards are fit out-of-fold so each subject's influence-function contribution uses learners trained without them. This gives honest inference when SL.library contains flexible learners that could over-fit in sample; n.folds = 1 keeps the faster in-sample fits for simple learners. Characterize and document the win ratio's small-sample behavior. The win ratio is a ratio, so it is mildly biased/anti-conservative at small n -- a well-known finite-sample property of the win ratio (the unadjusted Pocock win ratio too), not a defect of this estimator or its nuisances. A null simulation shows downward bias ~1% at ~400/arm with coverage ~0.93-0.94 / type-I ~0.06-0.07, becoming nominal (0.95-0.97) by ~800/arm. Cross-fitting does NOT change this (confirmed empirically), since it is a property of the win-ratio functional, not the empirical-process/over-fitting term. Document in the function help (new @section Small-sample behavior), the "Win ratios for trialists" vignette (new coverage-vs-n figure), and NEWS. Add scripts/make-clinical-wr-smalln.R to reproduce the sweep. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

blind-contours and others added 2 commits June 8, 2026 18:48

Link win-ratio pkgdown article from README win-ratio section

472793c

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-fit clinicalWinRatio; characterize & document small-sample behavior#22

Cross-fit clinicalWinRatio; characterize & document small-sample behavior#22
blind-contours wants to merge 2 commits into
mainfrom
wr-crossfit-smalln

blind-contours commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

n / arm	mean WR	95% coverage	type-I
400	0.99	0.93–0.94	0.06–0.07
800	1.00	0.97	0.03
1600	1.00	0.97	0.03

Conversation

blind-contours commented Jun 9, 2026

What

Docs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant