pycmprsk

A Python port of the R package cmprsk. Estimation, testing and regression modeling of subdistribution functions in competing risks:

cuminc - non-parametric cumulative incidence functions, with Gray's k-sample test across groups and stratification.
crr - Fine-Gray subdistribution-hazards regression, with time-fixed covariates (cov1), time-varying covariates (cov2 + tf), per-group censoring weights (cengroup), and the full Huber/White sandwich variance including the q(u) correction for the estimated censoring distribution.
predict_crr, summary_crr, timepoints, plot_cuminc, plot_predict - the same downstream API as R.

This package's functionality is numerically validated against R's cmprsk. See Parity testing below.

Install

pip install pycmprsk

The package is pure Python; the hot loops are JIT-compiled with numba.

Quick example

The arrays below were dumped from R using RNGversion("1.6.2"); set.seed(2) and the same data setup as cmprsk/tests/test.R. The resulting plots visually match R's plot(cuminc(...)) and plot(predict(crr(...))).

import matplotlib.pyplot as plt
import numpy as np

from pycmprsk import crr, cuminc, plot_cuminc, plot_predict, predict_crr


def tf_quad(uft):
    """Match R's ``function(uft) cbind(uft, uft^2)``."""
    uft = np.asarray(uft, dtype=np.float64)
    return np.column_stack([uft, uft**2])

Data arrays (Click to expand - Data sourced from R's test suite)

ftime = np.array([
    0.686305, 0.149818, 1.611875, 1.077275, 1.553027, 0.286783, 0.234919, 0.255626, 0.536215, 1.420936,
    1.979941, 0.816767, 0.970783, 3.376077, 1.407218, 0.229477, 2.821243, 1.598966, 0.661166, 0.291716,
    2.421805, 0.264711, 0.419970, 0.994872, 5.248650, 0.493777, 0.036222, 0.039556, 2.225511, 1.896816,
    1.562481, 2.080967, 0.062462, 0.308574, 0.854363, 1.086975, 0.183905, 0.877297, 0.166353, 1.346992,
    3.303843, 0.723761, 0.043173, 1.635107, 1.022373, 1.565542, 0.734400, 1.705071, 1.527256, 1.921497,
    1.854679, 0.310276, 2.424571, 0.515172, 1.251790, 1.054940, 0.010267, 1.079949, 0.136024, 0.466943,
    1.348637, 0.113960, 2.535242, 0.762922, 0.432438, 0.666299, 0.862624, 0.479771, 0.397440, 1.493170,
    0.661091, 0.540539, 1.355944, 0.773167, 3.902563, 0.117417, 1.786273, 0.072698, 0.259388, 2.092709,
    0.229584, 0.490496, 0.425987, 0.335195, 0.697602, 0.097860, 0.917998, 0.174528, 0.680717, 1.835194,
    2.997399, 1.937913, 0.520418, 1.653625, 2.238665, 0.149357, 0.720766, 0.096726, 0.831950, 1.003850,
])

fstatus = np.array([
    1, 2, 0, 2, 2, 1, 0, 2, 2, 0, 0, 2, 2, 0, 0, 0, 0, 0, 1, 0,
    2, 1, 1, 1, 1, 2, 0, 0, 1, 0, 1, 1, 1, 2, 0, 0, 1, 2, 0, 1,
    2, 1, 2, 0, 2, 0, 0, 2, 1, 2, 1, 1, 2, 1, 0, 2, 0, 2, 1, 2,
    0, 2, 2, 1, 2, 1, 2, 2, 1, 0, 0, 0, 0, 2, 0, 2, 2, 2, 2, 0,
    0, 0, 1, 1, 2, 1, 2, 2, 1, 0, 1, 1, 1, 2, 1, 2, 2, 0, 1, 1,
])

group_code = np.array([
    3, 1, 3, 1, 2, 2, 2, 3, 2, 2, 1, 2, 2, 3, 2, 3, 1, 1, 2, 1,
    1, 3, 2, 3, 1, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 1, 3, 2, 1, 3,
    3, 3, 3, 2, 2, 1, 1, 2, 2, 2, 3, 1, 1, 3, 2, 3, 3, 1, 1, 2,
    1, 3, 2, 1, 2, 3, 1, 1, 3, 3, 3, 1, 2, 3, 1, 1, 2, 2, 3, 3,
    2, 3, 3, 2, 2, 2, 2, 3, 2, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1,
])
group = np.array(["a", "b", "c"])[group_code - 1]

strata = np.array([
    1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1,
    2, 1, 2, 2, 1, 2, 1, 1, 2, 2, 2, 1, 1, 1, 2, 1, 1, 2, 1, 2,
    2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2, 1,
    1, 1, 1, 2, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1, 1,
    1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 2, 2,
])

cov1 = np.array([
    np.nan, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0,
    1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0,
    1.0, 1.0, 1.0, np.nan, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0,
    0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0,
    1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, np.nan, 1.0, 1.0,
    1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
    0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0,
    0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0,
    1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0,
    0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0,
    0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0,
    1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0,
    1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0,
    0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0,
    0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0,
    1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0,
    1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
    1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0,
    0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0,
    0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0,
]).reshape(100, 3)

cov2 = np.column_stack([cov1[:, 0], cov1[:, 0]])
cengroup = cov1[:, 2]

ci = cuminc(ftime, fstatus, group=group, strata=strata)
print("cuminc curve keys:", list(ci.curves.keys()))
print("Gray's k-sample tests (stat, pv, df):\n", ci.tests)

fit = crr(ftime, fstatus, cov1=cov1, cov2=cov2, tf=tf_quad, cengroup=cengroup)
print("crr coefs:", fit.coef)
print("crr converged:", fit.converged)

pred = predict_crr(
    fit,
    cov1=np.array([[1.0, 1.0, 1.0], [0.0, 0.0, 0.0]]),
    cov2=np.array([[1.0, 1.0], [0.0, 0.0]]),
)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
plot_cuminc(ci, ax=ax1)
ax1.set_title("cuminc(ss, cc, gg, strt)")
plot_predict(pred, ax=ax2)
ax2.set_title("predict(crr(ss, cc, cv, cov2, tf, cengroup=cv[,3]))")
plt.tight_layout()
plt.show()

What's different from R's `cmprsk`

pycmprsk is a port of the functionality, not a verbatim translation of the API:

Returns dataclasses (CRRResult, CumincResult, SummaryCRR) rather than R's named lists. Field names use Python snake_case (n_missing vs. R's n.missing, loglik_null vs. loglik.null).
tf (the time-varying covariate function) takes and returns NumPy arrays; Python's contract is that it returns shape (ndf, p2) (R's cmprsk wraps 1D output via as.matrix; do the equivalent with .reshape(-1, 1)).
na.action is fixed to "omit rows with any NA," matching R's default.

Behavioral parity is the explicit design goal - see src/tests/test_parity.py.

Parity testing

The test suite is 1:1 with R's cmprsk/tests/test.R: every scenario in that file has a corresponding .npz fixture (data + R reference outputs) under src/tests/fixtures/, and one matching Python test per scenario.

To regenerate the fixtures (requires R with the cmprsk and reticulate packages installed):

Rscript src/tests/r_fixtures.R

To run the parity tests:

pytest src/tests

License

pycmprsk is distributed under the GNU General Public License v3.0 or later (GPL-3.0-or-later).

This package is a derivative work of R's cmprsk (Bob Gray), which is licensed under GPL (>= 2). The Fortran sources from cmprsk/src/*.f have been re-implemented in Python while preserving the original algorithms. As a derivative work, pycmprsk must remain GPL-compatible.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
src		src
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pycmprsk

Install

Quick example

What's different from R's `cmprsk`

Parity testing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pycmprsk

Install

Quick example

What's different from R's cmprsk

Parity testing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

What's different from R's `cmprsk`

Packages