Skip to content

covertcast/pycmprsk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pycmprsk

A Python port of the R package cmprsk. Estimation, testing and regression modeling of subdistribution functions in competing risks:

  • cuminc - non-parametric cumulative incidence functions, with Gray's k-sample test across groups and stratification.
  • crr - Fine-Gray subdistribution-hazards regression, with time-fixed covariates (cov1), time-varying covariates (cov2 + tf), per-group censoring weights (cengroup), and the full Huber/White sandwich variance including the q(u) correction for the estimated censoring distribution.
  • predict_crr, summary_crr, timepoints, plot_cuminc, plot_predict - the same downstream API as R.

This package's functionality is numerically validated against R's cmprsk. See Parity testing below.

Install

pip install pycmprsk

The package is pure Python; the hot loops are JIT-compiled with numba.

Quick example

The arrays below were dumped from R using RNGversion("1.6.2"); set.seed(2) and the same data setup as cmprsk/tests/test.R. The resulting plots visually match R's plot(cuminc(...)) and plot(predict(crr(...))).

import matplotlib.pyplot as plt
import numpy as np

from pycmprsk import crr, cuminc, plot_cuminc, plot_predict, predict_crr


def tf_quad(uft):
    """Match R's ``function(uft) cbind(uft, uft^2)``."""
    uft = np.asarray(uft, dtype=np.float64)
    return np.column_stack([uft, uft**2])
Data arrays (Click to expand - Data sourced from R's test suite)
ftime = np.array([
    0.686305, 0.149818, 1.611875, 1.077275, 1.553027, 0.286783, 0.234919, 0.255626, 0.536215, 1.420936,
    1.979941, 0.816767, 0.970783, 3.376077, 1.407218, 0.229477, 2.821243, 1.598966, 0.661166, 0.291716,
    2.421805, 0.264711, 0.419970, 0.994872, 5.248650, 0.493777, 0.036222, 0.039556, 2.225511, 1.896816,
    1.562481, 2.080967, 0.062462, 0.308574, 0.854363, 1.086975, 0.183905, 0.877297, 0.166353, 1.346992,
    3.303843, 0.723761, 0.043173, 1.635107, 1.022373, 1.565542, 0.734400, 1.705071, 1.527256, 1.921497,
    1.854679, 0.310276, 2.424571, 0.515172, 1.251790, 1.054940, 0.010267, 1.079949, 0.136024, 0.466943,
    1.348637, 0.113960, 2.535242, 0.762922, 0.432438, 0.666299, 0.862624, 0.479771, 0.397440, 1.493170,
    0.661091, 0.540539, 1.355944, 0.773167, 3.902563, 0.117417, 1.786273, 0.072698, 0.259388, 2.092709,
    0.229584, 0.490496, 0.425987, 0.335195, 0.697602, 0.097860, 0.917998, 0.174528, 0.680717, 1.835194,
    2.997399, 1.937913, 0.520418, 1.653625, 2.238665, 0.149357, 0.720766, 0.096726, 0.831950, 1.003850,
])

fstatus = np.array([
    1, 2, 0, 2, 2, 1, 0, 2, 2, 0, 0, 2, 2, 0, 0, 0, 0, 0, 1, 0,
    2, 1, 1, 1, 1, 2, 0, 0, 1, 0, 1, 1, 1, 2, 0, 0, 1, 2, 0, 1,
    2, 1, 2, 0, 2, 0, 0, 2, 1, 2, 1, 1, 2, 1, 0, 2, 0, 2, 1, 2,
    0, 2, 2, 1, 2, 1, 2, 2, 1, 0, 0, 0, 0, 2, 0, 2, 2, 2, 2, 0,
    0, 0, 1, 1, 2, 1, 2, 2, 1, 0, 1, 1, 1, 2, 1, 2, 2, 0, 1, 1,
])

group_code = np.array([
    3, 1, 3, 1, 2, 2, 2, 3, 2, 2, 1, 2, 2, 3, 2, 3, 1, 1, 2, 1,
    1, 3, 2, 3, 1, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 1, 3, 2, 1, 3,
    3, 3, 3, 2, 2, 1, 1, 2, 2, 2, 3, 1, 1, 3, 2, 3, 3, 1, 1, 2,
    1, 3, 2, 1, 2, 3, 1, 1, 3, 3, 3, 1, 2, 3, 1, 1, 2, 2, 3, 3,
    2, 3, 3, 2, 2, 2, 2, 3, 2, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1,
])
group = np.array(["a", "b", "c"])[group_code - 1]

strata = np.array([
    1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1,
    2, 1, 2, 2, 1, 2, 1, 1, 2, 2, 2, 1, 1, 1, 2, 1, 1, 2, 1, 2,
    2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2, 1,
    1, 1, 1, 2, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1, 1,
    1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 2, 2,
])

cov1 = np.array([
    np.nan, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0,
    1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0,
    1.0, 1.0, 1.0, np.nan, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0,
    0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0,
    1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, np.nan, 1.0, 1.0,
    1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
    0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0,
    0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0,
    1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0,
    0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0,
    0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0,
    1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0,
    1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0,
    0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0,
    0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0,
    1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0,
    1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
    1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0,
    0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0,
    0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0,
]).reshape(100, 3)

cov2 = np.column_stack([cov1[:, 0], cov1[:, 0]])
cengroup = cov1[:, 2]
ci = cuminc(ftime, fstatus, group=group, strata=strata)
print("cuminc curve keys:", list(ci.curves.keys()))
print("Gray's k-sample tests (stat, pv, df):\n", ci.tests)

fit = crr(ftime, fstatus, cov1=cov1, cov2=cov2, tf=tf_quad, cengroup=cengroup)
print("crr coefs:", fit.coef)
print("crr converged:", fit.converged)

pred = predict_crr(
    fit,
    cov1=np.array([[1.0, 1.0, 1.0], [0.0, 0.0, 0.0]]),
    cov2=np.array([[1.0, 1.0], [0.0, 0.0]]),
)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
plot_cuminc(ci, ax=ax1)
ax1.set_title("cuminc(ss, cc, gg, strt)")
plot_predict(pred, ax=ax2)
ax2.set_title("predict(crr(ss, cc, cv, cov2, tf, cengroup=cv[,3]))")
plt.tight_layout()
plt.show()

What's different from R's cmprsk

pycmprsk is a port of the functionality, not a verbatim translation of the API:

  • Returns dataclasses (CRRResult, CumincResult, SummaryCRR) rather than R's named lists. Field names use Python snake_case (n_missing vs. R's n.missing, loglik_null vs. loglik.null).
  • tf (the time-varying covariate function) takes and returns NumPy arrays; Python's contract is that it returns shape (ndf, p2) (R's cmprsk wraps 1D output via as.matrix; do the equivalent with .reshape(-1, 1)).
  • na.action is fixed to "omit rows with any NA," matching R's default.

Behavioral parity is the explicit design goal - see src/tests/test_parity.py.

Parity testing

The test suite is 1:1 with R's cmprsk/tests/test.R: every scenario in that file has a corresponding .npz fixture (data + R reference outputs) under src/tests/fixtures/, and one matching Python test per scenario.

To regenerate the fixtures (requires R with the cmprsk and reticulate packages installed):

Rscript src/tests/r_fixtures.R

To run the parity tests:

pytest src/tests

License

pycmprsk is distributed under the GNU General Public License v3.0 or later (GPL-3.0-or-later).

This package is a derivative work of R's cmprsk (Bob Gray), which is licensed under GPL (>= 2). The Fortran sources from cmprsk/src/*.f have been re-implemented in Python while preserving the original algorithms. As a derivative work, pycmprsk must remain GPL-compatible.

About

A Python port of the R package cmprsk. Estimation, testing and regression modeling of subdistribution functions in competing risks.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors