ADCD — Anomaly-Driven Correction Discovery

Physics-constrained symbolic regression that discovers correction terms — not equations from scratch. The same logic that led from Newton to Einstein, from Rayleigh–Jeans to Planck.

▶ Download video (MP4)

📖 Documentation · ⚡ Quick Start · 📊 Benchmarks · ▶ Run in Colab

Science rarely discovers from a blank slate — it corrects. ADCD automates the step between anomaly and theory correction: given a classical law and data that disagrees with it, it searches for the minimal physically-valid correction term $\Delta$ — passing every candidate through dimensional, asymptotic, and complexity gates before a single parameter is ever fit.

⚡ Key Features

Correction-First Paradigm — Starts from a known classical law, not a blank slate. Focuses the search space on the discrepancy $\Delta$ between theory and experiment.
Cascaded Physics Gates — AST complexity, dimensional homogeneity, transcendental guardrails, and asymptotic consistency (ARC) gates screen out unphysical candidates before running parameter-fitting.
JAX-Traced L-BFGS-B Optimizer — Highly optimized parameter-scaled differentiable fitting with multi-restart log-uniform initialization.
BIC Model Selection — Employs the Bayesian Information Criterion (BIC) to rank models, favoring simpler physical theories over overly complex numerical fits.
Residual Feature Intelligence — Extracts mathematical features (monotonicity, curvature, oscillation, decay) from residuals to bias proposal templates.
Phase 2: Multivariable Discovery — Buckingham Π group decomposition + per-variable Sequential ARC + variance-factorization separability detection for multi-input physical laws.
Real-World Validated — Successfully identifies correct structural classes on Mercury's perihelion (GR), Lamb Shift (QED), Muon g-2 (Schwinger), and Blackbody (Planck).

📦 Installation

Install the stable package from PyPI:

pip install adcd

Or install from source:

git clone https://github.com/apiprdt/PhysicsPaper.git
cd PhysicsPaper
pip install -e ".[dev]"

Verify your installation:

pytest tests/

💻 Quick Start

1. High-Level Scientific API

Running ADCD on predefined physics benchmarks is extremely simple:

import adcd

# 1. Load a pre-defined benchmark scenario (e.g. Relativistic Kinetic Energy)
scenarios = adcd.get_all_scenarios()
scenario = scenarios[0] 

# 2. Run discovery in a single line!
result = adcd.discover_correction(scenario, max_iterations=5, proposer="mock")

# 3. View the best fit
print(f"Discovered correction: {result.best_expr}")       # θ₀ * (v/c)**2
print(f"LaTeX representation:  {result.export_latex()}")   # \theta_0 \left(\frac{v}{c}\right)^2
print(f"Parameters:            {result.best_theta}")
print(f"BIC Score:             {result.best_bic:.2f}")

# 4. Plot residuals
result.plot_residuals()

2. Custom Experimental Datasets

For custom datasets, use the adcd.fit function:

import numpy as np
import adcd

# Your custom data
x = np.linspace(1.0, 5.0, 100)
X = {"x": x}
y_classical = 2.0 * x
y_observed  = 2.0 * x + 0.5 * x**2   # True correction is 0.5 * x^2

# Run ADCD
result = adcd.fit(
    X=X,
    y_obs=y_observed,
    y_classical=y_classical,
    limit_variable="x",
    limit_direction="0",
    correction_mode="additive"
)

result.summary()

📊 Benchmark Results

1. Standard Benchmark (seed=42, Mock Proposer)

Scenario	Tier	0% Noise	1% Noise	5% Noise	10% Noise
Relativistic KE	Textbook	✓	✓	✓	✓
Yukawa Gravity	Textbook	✓	✓	✓	✓
Anharmonic Spring	Textbook	✓	✓	✓	✓
Screened Coulomb	Cross-Domain	✓	✓	✗	✗
Net Radiation	Cross-Domain	✓	✓	✓	✓
Nonlinear Drag	Cross-Domain	✓	✓	✓	✓
Mystery-A (tanh²)	Synthetic	✓	✓	✓	✓
Mystery-B (sinc)	Synthetic	✓	✓	✓	✓
Mystery-C (log-quotient)	Synthetic	✓	✓	✓	✓
Overall		100%	100%	88.9%	88.9%

2. PySR Comparison (fair profile: 100 iterations, 60s timeout)

Method	0% Noise	1% Noise	5% Noise	10% Noise
ADCD (ours, seed=42)	9/9 (100%)	9/9 (100%)	8/9 (88.9%)	8/9 (88.9%)
PySR fair	4/9 (44.4%)	5/9 (55.6%)	1/9 (11.1%)	5/9 (55.6%)

ADCD outperforms PySR by +77.8 percentage points at 5% noise.

3. Phase 2: Multivariable Benchmark (v2.2.1)

Scenario	Variables	ADCD Solved	Notes
Yukawa Mass-Ratio	m, M, r, r₀	✓	Π groups: m/M, r/r₀
Turbulent Drag	v, ρ, A, C_D	✓	Separable multiplicative
Coupled Oscillator	k, m, Ω, ω₀	✗	Mixed functional form
Van der Waals MV	a, b, P, V, T	✗	Requires 3rd Π group
Overall		2/4 (50%)	Baseline: 0/4

4. Real-World Physical Constants

Validation on historical anomalies using physical constants from JPL DE440, NIST, and CODATA:

Physical Scenario	Discovered Correction	Converged	Class Match	NMSE
Mercury Perihelion (GR)	`θ₀·vc²`	—	✓ polynomial	1.11e-05
Hydrogen Lamb Shift (QED)	`θ₀(n/θ₁)^(-θ₂)`	✓	✓ power_law	1.82e-18
Muon g-2 (Schwinger)	`θ₀(α/π)^θ₁`	✓	✓ polynomial	7.94e-07
Blackbody (Planck)	`-1 + e^(-f/θ₁)`	—	✓ exponential	2.59e-02

📁 Project Structure

PhysicsPaper/
├── src/adcd/                       # Installable package
│   ├── __init__.py                 # Public API (fit, discover_correction)
│   ├── anomaly_scenarios.py        # 9 standard + 3 blind + 4 multivariable scenarios
│   ├── arc_scorer.py               # Asymptotic consistency gate (ARC)
│   ├── buckingham_pi.py            # [Phase 2] Buckingham Π group engine
│   ├── coarse_evaluator.py         # Coarse numerical pre-filter
│   ├── correction_orchestrator.py  # Main multi-iteration discovery loop
│   ├── dimensional_checker.py      # Dimensional homogeneity + transcendental gate
│   ├── jax_optimizer.py            # JAX L-BFGS-B optimizer
│   ├── llm_proposer.py             # Mock + Gemini + OpenAI proposers
│   ├── metrics.py                  # NMSE, BIC, structural classification
│   ├── multivar_orchestrator.py    # [Phase 2] Multivariable correction pipeline
│   ├── pipeline.py                 # Stage 1 filter cascade
│   ├── real_data_loader.py         # Real-world data loading (JPL, NIST, CODATA)
│   ├── residual_factorizer_v2.py   # [Phase 2] Variance-decomposition separability
│   ├── result.py                   # CorrectionResult object
│   └── sequential_arc.py           # [Phase 2] Per-variable Sequential ARC checker
├── tests/                          # 116 unit + integration tests
├── paper/                          # LaTeX source (main.tex) + figures
├── run_correction_discovery.py     # Benchmark runner
└── README.md                       # This file

📖 Citing This Work

If you use ADCD in your research, please cite:

@software{erdita2026adcd,
  author    = {Erdita, Muhammad Afif},
  title     = {{Anomaly-Driven Correction Discovery (ADCD): Physics-Constrained
                Symbolic Regression for Evolutionary Scientific Discovery}},
  year      = {2026},
  publisher = {Zenodo},
  version   = {2.2.1},
  doi       = {10.5281/zenodo.20534940},
  url       = {https://doi.org/10.5281/zenodo.20534940}
}

🔬 Reproducibility

Every quantitative claim in this project is reproducible from committed scripts. No number is hand-typed.

# Regenerate the 9-scenario benchmark (seed=42)
python run_correction_discovery.py

# Multi-seed study (5 seeds × 9 scenarios × 4 noise levels)
python run_reproducibility.py

# Guard: fails loudly if any headline number drifts
python scripts/verify_paper_claims.py

# SPARC MOND robustness study
python -m adcd.experiments.sparc_robustness

The full test suite (116 tests) must pass before any release:

pytest tests/ -q

See docs/SUBMISSION_CHECKLIST_v2.1.3.md for the end-to-end release procedure.

👥 AI Disclosure & Responsible Use

Transparency matters. This section documents exactly how AI tools were used in the ADCD project, in line with emerging norms for AI-assisted scientific software.

Authoring assistance

The source code, the accompanying paper, and this documentation were written with assistance from AI assistants (Google DeepMind's Antigravity, and earlier OpenAI/Claude-based coding tools). AI was used as a pair-programming and writing aid:

Code generation & refactoring — drafting modules, fixing lint errors, generating boilerplate, suggesting type hints.
Prose editing — improving clarity, grammar, and structure of the paper and docs.
Debugging — diagnosing stack traces, suggesting fixes for JAX/NumPy numerical issues.
Code review — catching edge cases, suggesting test coverage improvements.

AI as an optional discovery backend (not a co-author)

ADCD supports an LLM-based proposer (src/adcd/llm_proposer.py), which can query a language model (Gemini or OpenAI) to suggest candidate correction templates. This is an opt-in research feature, not the default:

The default and headline benchmarks use the mock proposer (deterministic template library), not the LLM proposer.
When the LLM proposer is enabled, its suggestions are still passed through the full physics-gate pipeline (dimensional homogeneity, ARC, BIC). The AI cannot bypass physics validation — every candidate must satisfy the same constraints as any other template.
AI never runs experiments or computes final benchmark numbers. All quantitative results were generated, verified, and curated by the author.

What is not AI-generated

To be explicit, the following are the sole intellectual contribution of the author (Muhammad Afif Erdita):

The scientific idea — anomaly-driven correction discovery as opposed to blank-slate symbolic regression.
The physics-gate pipeline design (cascaded gates, ARC, BIC reranking, Occam's razor).
All experimental design decisions: scenario selection, noise levels, evaluation protocols, the PySR fair-profile comparison, the SPARC MOND validation protocol.
Selection and interpretation of real-world benchmarks (Mercury perihelion, Lamb Shift, Muon g-2, Blackbody).
All claims, conclusions, and limitations discussed in the paper.

Reproducibility safeguard

Because AI tools can fabricate plausible-looking numbers, every benchmark figure reported in this README and in the paper is regenerable from frozen scripts (run_correction_discovery.py, run_reproducibility.py, scripts/verify_paper_claims.py). The verify_paper_claims.py guard fails loudly if any headline number drifts. Nothing in the headline results was hand-typed from AI output.

If you spot any inaccuracy or have questions about AI use in this project, please open an issue.

📄 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.github		.github
data/sparc		data/sparc
docs		docs
paper		paper
results		results
scratch		scratch
scripts		scripts
src/adcd		src/adcd
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.zenodo.json		.zenodo.json
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ablation_results.json		ablation_results.json
arxiv_bundle.tar		arxiv_bundle.tar
benchmark_optimizer.py		benchmark_optimizer.py
binary_pulsar_sensitivity.json		binary_pulsar_sensitivity.json
blind_benchmark_results.json		blind_benchmark_results.json
correction_scaling_results.json		correction_scaling_results.json
efficiency_table.md		efficiency_table.md
experiment_results.md		experiment_results.md
gate_telemetry.json		gate_telemetry.json
generate_figures.py		generate_figures.py
hybrid_seed42_results.json		hybrid_seed42_results.json
misspecification_results.json		misspecification_results.json
mkdocs.yml		mkdocs.yml
mlp_baseline_results.json		mlp_baseline_results.json
multivariable_benchmark_results.json		multivariable_benchmark_results.json
oracle_ablation_results.json		oracle_ablation_results.json
pyproject.toml		pyproject.toml
pysr_baseline_fair.json		pysr_baseline_fair.json
pysr_baseline_results.json		pysr_baseline_results.json
pysr_profiles.py		pysr_profiles.py
real_data_results.json		real_data_results.json
reproduce_all.ps1		reproduce_all.ps1
reproducibility_results.json		reproducibility_results.json
requirements.txt		requirements.txt
run_ablation.py		run_ablation.py
run_benchmark.py		run_benchmark.py
run_binary_pulsar_sensitivity.py		run_binary_pulsar_sensitivity.py
run_correction_discovery.py		run_correction_discovery.py
run_correction_scaling.py		run_correction_scaling.py
run_experiments.py		run_experiments.py
run_grammar_blind_benchmark.py		run_grammar_blind_benchmark.py
run_llm_benchmark.py		run_llm_benchmark.py
run_misspecification_benchmark.py		run_misspecification_benchmark.py
run_mlp_baseline.py		run_mlp_baseline.py
run_multivariable_benchmark.py		run_multivariable_benchmark.py
run_oracle_ablation.py		run_oracle_ablation.py
run_pysr_baseline.py		run_pysr_baseline.py
run_real_data_benchmark.py		run_real_data_benchmark.py
run_reproducibility.py		run_reproducibility.py
run_timing_analysis.py		run_timing_analysis.py
timing_results.json		timing_results.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ADCD — Anomaly-Driven Correction Discovery

⚡ Key Features

📦 Installation

💻 Quick Start

1. High-Level Scientific API

2. Custom Experimental Datasets

📊 Benchmark Results

1. Standard Benchmark (seed=42, Mock Proposer)

2. PySR Comparison (fair profile: 100 iterations, 60s timeout)

3. Phase 2: Multivariable Benchmark (v2.2.1)

4. Real-World Physical Constants

📁 Project Structure

📖 Citing This Work

🔬 Reproducibility

👥 AI Disclosure & Responsible Use

Authoring assistance

AI as an optional discovery backend (not a co-author)

What is not AI-generated

Reproducibility safeguard

📄 License

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ADCD — Anomaly-Driven Correction Discovery

⚡ Key Features

📦 Installation

💻 Quick Start

1. High-Level Scientific API

2. Custom Experimental Datasets

📊 Benchmark Results

1. Standard Benchmark (seed=42, Mock Proposer)

2. PySR Comparison (fair profile: 100 iterations, 60s timeout)

3. Phase 2: Multivariable Benchmark (v2.2.1)

4. Real-World Physical Constants

📁 Project Structure

📖 Citing This Work

🔬 Reproducibility

👥 AI Disclosure & Responsible Use

Authoring assistance

AI as an optional discovery backend (not a co-author)

What is not AI-generated

Reproducibility safeguard

📄 License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages