causalrl

Causal intervention-selection and causal-RL research tools.

causalrl provides graph algorithms for causal bandits, demonstration environments and agents, and explicit-latent structural causal models with see (L1), do (L2), and counterfactual (L3) queries, organised around the 9-task taxonomy of causal RL.

Scope is explicit and enforced in code: out-of-class identification queries raise NotIdentifiableError with the witnessing hedge (or return None for the conservative helpers) rather than guessing a formula, and learning agents are tabular/demo-scale, not production RL. See Guarantees & Scope.

Install

pip install causalrl            # core: graph, POMIS, tabular agents/environments
pip install "causalrl[torch]"   # + SCM sampling, neural mechanisms, Torch-backed demos

From a clone, for development:

uv sync --extra dev             # tests, lint, typing, notebooks
uv sync --extra docs            # local documentation site and API reference

The core graph, POMIS, tabular-agent, and tabular-environment surfaces do not require PyTorch; SCM sampling, neural mechanisms, and structural-bandit environments do. Full documentation: https://raphaelrrcoelho.github.io/causalrl/.

Quickstart

A causal agent that conditions on its "intuition" beats a confounding-naive agent on the Multi-Armed Bandit with Unobserved Confounders — even though both arms have identical interventional means.

from causalrl.agents.bandits import CausalThompsonSampling
from causalrl.envs.suite.mabuc import MABUCEnv

env = MABUCEnv(seed=1)
agent = CausalThompsonSampling(n_arms=2, n_contexts=2, seed=0)

obs, _ = env.reset(seed=1)
for _ in range(8000):
    action = agent.act(obs)
    _, reward, _, _, _ = env.step(action)
    agent.update(obs, action, reward)
    obs, _ = env.reset()
# CausalThompsonSampling -> ~0.75 reward/step; any confounding-naive policy is capped near 0.50,
# since both arms share an interventional mean.

What it does

Task (taxonomy)	Capability	Key entry points
Decision under confounding	Counterfactual Thompson sampling on the MABUC	`CausalThompsonSampling`
1 — Offline→online	Learn from confounded logs via causal bounds	`UCDTR`, `DOVI`, `DeepDeconfoundedQ`
2 — Where to intervene	POMIS / MIS, incl. non-manipulable variables	`pomis`, `minimal_intervention_sets`
3 — Counterfactual policy	Act on `E[Y_do(a) \| intent]`	`CounterfactualOptimalPolicy`
4 — Transportability	Recover effects across domains	`transport_formula`, `transported_effect`
5 — Causal discovery	PC / FCI structure learning	`discover`, `CPDAG`
6 — Causal imitation	Imitability + confounded cloning	`is_imitable`, `CausalImitator`
7 — Causal curriculum	Prerequisite-ordered skill learning	`causal_curriculum`
8 — Reward shaping	Policy-invariant causal potentials	`causal_potential`, `q_learning`
9 — Causal games	Influence diagrams + equilibria	`pure_nash_equilibria`, `CausalGame`
Identification	Complete ID / gID / sID / mz; partial-ID, sensitivity & decision certificates	`identify_effect`, `manski_bounds`, `certify_decision`

A runnable example for every row is in the Tour by Task; end-to-end notebooks are in examples/ and the Tutorials.

How it compares

causalrl is causal-RL-first, where the established causal libraries are estimation-first:

DoWhy / EconML / CausalML target treatment-effect estimation and the identify→estimate→refute workflow on i.i.d. data. They are mature, production-grade tools. causalrl instead targets sequential decision-making: intervention-set selection (POMIS), confounded offline-to-online RL, counterfactual policies, and causal curricula / shaping / games. Those are the parts of the Bareinboim taxonomy these libraries do not cover.
For pure graph identification it overlaps with Ananke / pgmpy / Y0. It deliberately does not reimplement offline RL at scale; pair it with a dedicated library such as d3rlpy for that.

Use causalrl when your problem is a causal decision over time; use DoWhy/EconML when it is a treatment-effect estimate.

Stability

The public API — the names exported from the top-level causalrl package — is stable and follows semantic versioning: from v1.0.0 on, breaking changes to exported names move the major version. The 0.99.x line deliberately let the surface settle in real use first; 1.0 commits to it. See Guarantees & Scope for what each method does and does not promise.

Reproducible benchmarks

uv run --extra dev python benchmarks/scbandit_report.py confounded-chain \
  --seeds 0,1,2,3,4 --steps 8000 --tail-window 2000 --n-mc 2000

The JSON report includes each seed's result plus summary uncertainty. These maintained demonstrations validate package behaviour on the stated environments; they are not general performance guarantees.

Development

uv run pytest                               # tests
uv run ruff check .                         # lint
uv run pyright src                          # types
uv run --extra docs mkdocs build --strict   # documentation

Contributions are welcome — see CONTRIBUTING.md.

Citing

If you use causalrl in research, cite the metadata in CITATION.cff and the primary source for the method you used (each is attributed inline in the Tour by Task and its source module). See Citing causalrl.

Acknowledgements

This library would not exist without the body of work it stands on. Particular thanks to:

Elias Bareinboim, whose 9-task taxonomy of causal reinforcement learning is the organising spine of causalrl, and whose results with collaborators are the core of nearly every slice — do-calculus completeness (with Shpitser & Pearl), transportability and selection diagrams (with Pearl), counterfactual data fusion (with Forney & Pearl), POMIS / structural causal bandits (with Lee), and causal imitation learning (with Zhang & Kumor).
Judea Pearl, for the do-calculus and Pearl Causal Hierarchy that make every L1 / L2 / L3 query in this library well-defined.
Sanghack Lee, for the reference POMIS implementation the intervention-set engine is adapted from (MIT-licensed; attribution in src/causalrl/identification/intervention_sets.py).

Other foundational references — Spirtes, Glymour & Scheines; Zhang; Manski; Tan; Koller & Milch; Ng, Harada & Russell; Bengio et al. — are cited inline at the slice that uses each.

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
.devcontainer		.devcontainer
.github		.github
benchmarks		benchmarks
docs		docs
examples		examples
src/causalrl		src/causalrl
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

causalrl

Install

Quickstart

What it does

How it compares

Stability

Reproducible benchmarks

Development

Citing

Acknowledgements

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

causalrl

Install

Quickstart

What it does

How it compares

Stability

Reproducible benchmarks

Development

Citing

Acknowledgements

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages