PySofra turns datasets, fitted models, and summary statistics into publication-ready tables — across HTML · Markdown · LaTeX · DOCX · PPTX · XLSX · PNG — from a single immutable
SofraTableobject. It brings the practical workflows of R'stableone,gtsummary, andflextableinto a single coherent Pythonic API.
Adjusted ORs + inline forest plot tbl_regression(fit).with_forest_plot()
|
KM table + inline survival curve tbl_survival(...).with_km_plot()
|
- One immutable object, seven output formats — build a
SofraTableonce, render to HTML / Markdown / LaTeX / DOCX / PPTX / XLSX / PNG, all byte-deterministic across processes - Auto-dispatched statistical tests — Welch, Wilcoxon, ANOVA, Kruskal–Wallis, Fisher, χ², Rao–Scott, design-adjusted t — picked by variable kind, overridable per-row
- Inline forest plots and KM curves — embed matplotlib figures directly into the table; the same
SofraTablerenders them across every backend - Statistically correct — every numeric output validated against
scipy/statsmodels/lifelinesreference implementations at machine precision - Method-chainable and immutable — every modifier returns a new table; no in-place mutation, no global state, fully reproducible
Showcase notebook — 47 cells, every section a side-by-side numeric proof. Start here if you have 60 seconds.
End-to-end tutorial — 126 cells walking every public feature on a synthetic two-arm trial.
import numpy as np
import pandas as pd
import pysofra as ps
# Toy two-arm trial; replace with your own DataFrame in real use.
rng = np.random.default_rng(0)
df = pd.DataFrame({
"arm": rng.choice(["Placebo", "Treatment"], 200),
"age": rng.normal(60, 10, 200).round(),
"bmi": rng.normal(28, 5, 200).round(1),
"event": rng.binomial(1, 0.3, 200),
})
# Table 1 — baseline characteristics by treatment arm
tbl = (
ps.tbl_one(df, by="arm")
.add_p()
.add_smd()
.add_overall()
.theme("clinical")
)
tbl # renders in Jupyter / Colab / VS Code
tbl.to_docx("table1.docx") # publication-quality Word
tbl.to_html() # standalone HTML fragment
tbl.to_markdown() # GitHub-flavored MarkdownThe same workflow handles regression tables:
import statsmodels.api as sm
X = sm.add_constant(df[["age", "bmi"]])
fit = sm.Logit(df["event"], X).fit(disp=False)
(
ps.tbl_regression(fit, exponentiate=True)
.bold_p()
.theme("jama")
.to_docx("table2.docx")
)The end-to-end worked example — baseline table by treatment arm, regression table with forest plot, and Kaplan-Meier survival summary — is in the showcase notebook.
| Feature | Function / object | Status |
|---|---|---|
| Baseline Table 1 | ps.tbl_one |
MVP |
| Descriptive summary | ps.tbl_summary |
MVP |
| Regression results | ps.tbl_regression |
MVP |
| Side-by-side merge | ps.tbl_merge |
MVP |
| Vertical stack | ps.tbl_stack |
MVP |
| HTML / Markdown | .to_html / .to_markdown |
MVP |
| DOCX export | .to_docx |
MVP |
| LaTeX export | .to_latex |
MVP |
| PPTX export | .to_pptx |
MVP (extras) |
| Excel export | .to_xlsx |
MVP |
| Inline forest plots | tbl_regression(...).with_forest_plot() |
MVP |
| Inline KM curves | tbl_survival(...).with_km_plot() |
MVP |
| Cross-backend plot embedding | DOCX/PPTX/LaTeX include the plot too | MVP |
| Rao–Scott chi-square | weighted Table 1 auto-route | MVP |
SurveyDesign (strata + cluster + FPC) |
Taylor-linearised variance | MVP |
| Themes | clinical, jama, nejm, compact, minimal |
MVP |
| Auto test selection | t-test / ANOVA / Wilcoxon / Kruskal / χ² / Fisher | MVP |
| Per-variable test overrides | tests={'age': 'wilcoxon', ...} |
MVP |
| Multiplicity adjustment | .add_q() — BH, BY, Bonferroni, Holm, Hommel, Šidák |
MVP |
| Multi-model regression | tbl_regression([m1, m2], model_labels=[...]) |
MVP |
| lifelines (Cox / AFT) | tbl_regression(cph) |
MVP |
| sklearn (linear models) | tbl_regression(clf) — point estimates only |
MVP |
| Kaplan–Meier summary | tbl_survival(df, time=, event=, by=, times=[...]) |
MVP |
| Survey-weighted Table 1 | tbl_one(..., weights='w') |
MVP |
| polars input | tbl_one(pl.DataFrame(...)) |
MVP |
| Conditional formatting | .bold_if, .highlight_if, .style_if |
MVP |
| Sticky-header notebook tables | .to_html(sticky_header=True) |
MVP |
| Standardised mean differences | continuous + categorical (Yang–Dalton) | MVP |
| Notebook rendering | _repr_html_ / _repr_markdown_ / _repr_latex_ |
MVP |
- Backend-agnostic tables. A
SofraTableis the single source of truth; every renderer (HTML, Markdown, DOCX, …) reads the same object. - Immutable method chaining. Every modifier returns a new
SofraTable. No surprises, no global state. - Strong defaults, explicit overrides. Sensible journal-style output out of the box; per-variable type, label, and test overrides when you need them.
- Deterministic. The same input always produces the same output — critical for reproducible research.
- No magic. No nonstandard evaluation, no metaprogramming, no network calls, no telemetry.
pip install pysofraPySofra requires Python ≥ 3.11. The core install only pulls numpy,
pandas, scipy, statsmodels, and python-docx. Domain extras unlock
the features that depend on heavier optional libraries:
pip install "pysofra[survival]" # tbl_survival + KM curves (lifelines, matplotlib)
pip install "pysofra[plot]" # forest plots, table-as-image (matplotlib)
pip install "pysofra[pptx]" # PowerPoint export (python-pptx)
pip install "pysofra[xlsx]" # Excel export (xlsxwriter)
pip install "pysofra[polars]" # accept polars DataFrames as input
pip install "pysofra[sklearn]" # tbl_regression on scikit-learn models
pip install "pysofra[all]" # everything above
pip install "pysofra[dev]" # testing + linting (pytest, ruff, mypy, hypothesis)PySofra is in alpha (0.1.0a7). The public API surface is pinned
by an explicit
API-stability test
so that any unintended rename, removal, or signature change surfaces as
a failed test. Quality bar at this release:
- 900+ tests passing, near-100% line coverage, mypy strict, ruff clean.
- Every numeric output is validated against
scipy,lifelines,statsmodels, or a hand-computed textbook formula (test_statistical_correctness.py). - Universal invariants enforced via Hypothesis on 720 randomized examples per CI run (test_property_invariants.py).
- Renderer output is byte-deterministic — identical input always produces identical HTML/Markdown/LaTeX, required for reproducible publication artifacts (test_renderer_consistency.py).
Bug reports and use-case feedback are very welcome.
Bug reports, feature requests, and pull requests are all very welcome.
Please read
CONTRIBUTING.md
for the workflow, the quality gates, and the
Code of Conduct.
GPL-3.0-or-later. See
LICENSE.
If you use PySofra in academic work, please cite the project — see
CITATION.cff.


