paper-chase

Agent-based simulation of automated science under publish-or-perish incentives.

Paper-chase simulates a scientific publishing ecosystem to test which governance interventions keep the literature's truth-content (precision) and discovery rate (recall) high as agents optimize for novelty rewards. Each intervention — pre-registration, replication-and-retraction, measurement-invariance — is mapped onto the precision/recall Pareto plane across incentive pressure and systematic bias. A recurring result: an independent, cross-model auditor recovers precision that a same-base-model auditor cannot, because a same-source check carries the same systematic bias it is meant to catch.

We start with a validity gate on the statistical engine. As incentives increase reward for novel results over replication of previously published work, the literature's precision falls: rising QRP inflates the effective false-positive rate, so false positives accumulate in the standing literature, driving truth-content down.

Next, we implement interventions hypothesized to impact either/both precision and recall. Mapped to the Pareto plane, dominance regimes and tradeoffs are characterized for baseline, per-intervention, and intervention combinations.

Initial interventions include:

pre-registration (hypothesis, methods, analysis)
incentivized replication + retraction
measurement-invariance requirements

Builds on Smaldino & McElreath, The natural selection of bad science (RSOS 2016).

Status

For recent results, see example runs. For the curated synthesis — established findings by regime, each with mechanism and a status tag — see FINDINGS.md.

Statistical engine validated (FPR ≈ α at q=0; power monotone in n)

With no mitigation, the literature's truth-content falls as the novelty:replication reward ratio rises; the qualitative crisis dynamic is reproduced.

Initial interventions (in progress)

pre-registration: a modest precision lift (largest at low bias, vanishing as systematic bias takes over — it addresses QRP, not bias) at a small recall cost
incentivized replication + retraction: raises precision substantially at low–moderate bias and cuts recall; but a same-base audit inherits the shared systematic bias, so its precision collapses once that bias is strong — an argument for cross-model audit
measurement-invariance: the initial uniform sampling algorithm compute-restricted experiments to a small number of invariance-replications, so precision improvements were observed, but the significant reduction in recall dominated this intervention; next up: realistic sampling algorithms

Future extensions:

cross-context bias persistence — does invariance keep its advantage when a shared base model's bias persists across contexts, not just within? (the load-bearing test; the current model draws each context's bias independently)
adaptive (RL) agents that learn to game/reward-hack
early-warning detection

Setup

uv sync                                # creates .venv, installs deps + dev group, writes uv.lock
uv run pytest                          # statistical engine: FPR ≈ α, power monotone in n
uv run python scripts/run_baseline.py  # produces results/validity_gate.png

Metrics

truth-content = TP / (TP + FP) = precision over the standing literature = 1 − FDR
discovery rate = TP / (TP + FN) = recall (field-level power)
Pareto gives a read on both and is necessary here since precision is gameable
('publish nothing' maps to precision = 1)

Acknowledgments

The framing — replication crisis in automated science as a problem worth stress-testing via simulation of mitigations — is from one of the project bullets on Konstantinos Voudouris's Pivotal mentor profile. The implementation here is mine; design choices, stylized parameter values, and errors are mine alone.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
docs		docs
example-runs		example-runs
paper_chase		paper_chase
scripts		scripts
tests		tests
.gitignore		.gitignore
.python-version		.python-version
FINDINGS.md		FINDINGS.md
FUTURE_WORK.md		FUTURE_WORK.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

paper-chase

Status

Setup

Metrics

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

paper-chase

Status

Setup

Metrics

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages