Cross-fit clinicalWinRatio; characterize & document small-sample behavior#22
Open
blind-contours wants to merge 2 commits into
Open
Cross-fit clinicalWinRatio; characterize & document small-sample behavior#22blind-contours wants to merge 2 commits into
blind-contours wants to merge 2 commits into
Conversation
Add V-fold cross-fitting (n.folds, default 5) to clinicalWinRatio(): the transition and censoring hazards are fit out-of-fold so each subject's influence-function contribution uses learners trained without them. This gives honest inference when SL.library contains flexible learners that could over-fit in sample; n.folds = 1 keeps the faster in-sample fits for simple learners. Characterize and document the win ratio's small-sample behavior. The win ratio is a ratio, so it is mildly biased/anti-conservative at small n -- a well-known finite-sample property of the win ratio (the unadjusted Pocock win ratio too), not a defect of this estimator or its nuisances. A null simulation shows downward bias ~1% at ~400/arm with coverage ~0.93-0.94 / type-I ~0.06-0.07, becoming nominal (0.95-0.97) by ~800/arm. Cross-fitting does NOT change this (confirmed empirically), since it is a property of the win-ratio functional, not the empirical-process/over-fitting term. Document in the function help (new @section Small-sample behavior), the "Win ratios for trialists" vignette (new coverage-vs-n figure), and NEWS. Add scripts/make-clinical-wr-smalln.R to reproduce the sweep. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Two related improvements to the experimental
clinicalWinRatio():Cross-fitting (
n.folds, default 5). The transition and censoring hazardsare now fit out-of-fold, so each subject's influence-function contribution uses
nuisances trained without them. This removes the empirical-process (over-fitting)
term and gives honest inference when
SL.librarycontains flexible learners(random forests, HAL, penalized regression).
n.folds = 1keeps the fasterin-sample fits for simple parametric learners. Verified the estimate is
unchanged at large n (WR 1.590 vs brute-force truth 1.585).
Small-sample behavior — characterized and documented honestly. The win
ratio is a ratio, so it is mildly biased and anti-conservative at small n.
This is a well-known finite-sample property of the win ratio (it affects the
unadjusted Pocock win ratio too), not a defect of this estimator or its
nuisances. A null simulation (both arms identical, true WR = 1) gives:
Bias ~1% at 400/arm, gone by 800; coverage nominal by ~800/arm.
Crucially, cross-fitting does not change this — confirmed empirically
(cross-fitted vs in-sample at n=400 gave identical coverage). It is a property
of the win-ratio functional, not the empirical-process term, so the methodological
reflex of "under-coverage → cross-fit" does not apply here.
Docs
@section Small-sample behavior; corrected@param n.folds(it no longer claims to fix the small-n issue).
coverage-vs-n figure.
NEWS.mdupdated.scripts/make-clinical-wr-smalln.Rreproduces the sweep.🤖 Generated with Claude Code