Physics-constrained symbolic regression that discovers correction terms β not equations from scratch. The same logic that led from Newton to Einstein, from RayleighβJeans to Planck.
π Documentation Β Β·Β β‘ Quick Start Β Β·Β π Benchmarks Β Β·Β βΆ Run in Colab
Science rarely discovers from a blank slate β it corrects. ADCD automates the step between anomaly and theory correction: given a classical law and data that disagrees with it, it searches for the minimal physically-valid correction term
$\Delta$ β passing every candidate through dimensional, asymptotic, and complexity gates before a single parameter is ever fit.
-
Correction-First Paradigm β Starts from a known classical law, not a blank slate. Focuses the search space on the discrepancy
$\Delta$ between theory and experiment. - Cascaded Physics Gates β AST complexity, dimensional homogeneity, transcendental guardrails, and asymptotic consistency (ARC) gates screen out unphysical candidates before running parameter-fitting.
- JAX-Traced L-BFGS-B Optimizer β Highly optimized parameter-scaled differentiable fitting with multi-restart log-uniform initialization.
- BIC Model Selection β Employs the Bayesian Information Criterion (BIC) to rank models, favoring simpler physical theories over overly complex numerical fits.
- Residual Feature Intelligence β Extracts mathematical features (monotonicity, curvature, oscillation, decay) from residuals to bias proposal templates.
- Phase 2: Multivariable Discovery β Buckingham Ξ group decomposition + per-variable Sequential ARC + variance-factorization separability detection for multi-input physical laws.
- Real-World Validated β Successfully identifies correct structural classes on Mercury's perihelion (GR), Lamb Shift (QED), Muon g-2 (Schwinger), and Blackbody (Planck).
Install the stable package from PyPI:
pip install adcdOr install from source:
git clone https://github.com/apiprdt/PhysicsPaper.git
cd PhysicsPaper
pip install -e ".[dev]"Verify your installation:
pytest tests/Running ADCD on predefined physics benchmarks is extremely simple:
import adcd
# 1. Load a pre-defined benchmark scenario (e.g. Relativistic Kinetic Energy)
scenarios = adcd.get_all_scenarios()
scenario = scenarios[0]
# 2. Run discovery in a single line!
result = adcd.discover_correction(scenario, max_iterations=5, proposer="mock")
# 3. View the best fit
print(f"Discovered correction: {result.best_expr}") # ΞΈβ * (v/c)**2
print(f"LaTeX representation: {result.export_latex()}") # \theta_0 \left(\frac{v}{c}\right)^2
print(f"Parameters: {result.best_theta}")
print(f"BIC Score: {result.best_bic:.2f}")
# 4. Plot residuals
result.plot_residuals()For custom datasets, use the adcd.fit function:
import numpy as np
import adcd
# Your custom data
x = np.linspace(1.0, 5.0, 100)
X = {"x": x}
y_classical = 2.0 * x
y_observed = 2.0 * x + 0.5 * x**2 # True correction is 0.5 * x^2
# Run ADCD
result = adcd.fit(
X=X,
y_obs=y_observed,
y_classical=y_classical,
limit_variable="x",
limit_direction="0",
correction_mode="additive"
)
result.summary()| Scenario | Tier | 0% Noise | 1% Noise | 5% Noise | 10% Noise |
|---|---|---|---|---|---|
| Relativistic KE | Textbook | β | β | β | β |
| Yukawa Gravity | Textbook | β | β | β | β |
| Anharmonic Spring | Textbook | β | β | β | β |
| Screened Coulomb | Cross-Domain | β | β | β | β |
| Net Radiation | Cross-Domain | β | β | β | β |
| Nonlinear Drag | Cross-Domain | β | β | β | β |
| Mystery-A (tanhΒ²) | Synthetic | β | β | β | β |
| Mystery-B (sinc) | Synthetic | β | β | β | β |
| Mystery-C (log-quotient) | Synthetic | β | β | β | β |
| Overall | 100% | 100% | 88.9% | 88.9% |
| Method | 0% Noise | 1% Noise | 5% Noise | 10% Noise |
|---|---|---|---|---|
| ADCD (ours, seed=42) | 9/9 (100%) | 9/9 (100%) | 8/9 (88.9%) | 8/9 (88.9%) |
| PySR fair | 4/9 (44.4%) | 5/9 (55.6%) | 1/9 (11.1%) | 5/9 (55.6%) |
ADCD outperforms PySR by +77.8 percentage points at 5% noise.
| Scenario | Variables | ADCD Solved | Notes |
|---|---|---|---|
| Yukawa Mass-Ratio | m, M, r, rβ | β | Ξ groups: m/M, r/rβ |
| Turbulent Drag | v, Ο, A, C_D | β | Separable multiplicative |
| Coupled Oscillator | k, m, Ξ©, Οβ | β | Mixed functional form |
| Van der Waals MV | a, b, P, V, T | β | Requires 3rd Ξ group |
| Overall | 2/4 (50%) | Baseline: 0/4 |
Validation on historical anomalies using physical constants from JPL DE440, NIST, and CODATA:
| Physical Scenario | Discovered Correction | Converged | Class Match | NMSE |
|---|---|---|---|---|
| Mercury Perihelion (GR) | ΞΈβΒ·vcΒ² |
β | β polynomial | 1.11e-05 |
| Hydrogen Lamb Shift (QED) | ΞΈβ(n/ΞΈβ)^(-ΞΈβ) |
β | β power_law | 1.82e-18 |
| Muon g-2 (Schwinger) | ΞΈβ(Ξ±/Ο)^ΞΈβ |
β | β polynomial | 7.94e-07 |
| Blackbody (Planck) | -1 + e^(-f/ΞΈβ) |
β | β exponential | 2.59e-02 |
PhysicsPaper/
βββ src/adcd/ # Installable package
β βββ __init__.py # Public API (fit, discover_correction)
β βββ anomaly_scenarios.py # 9 standard + 3 blind + 4 multivariable scenarios
β βββ arc_scorer.py # Asymptotic consistency gate (ARC)
β βββ buckingham_pi.py # [Phase 2] Buckingham Ξ group engine
β βββ coarse_evaluator.py # Coarse numerical pre-filter
β βββ correction_orchestrator.py # Main multi-iteration discovery loop
β βββ dimensional_checker.py # Dimensional homogeneity + transcendental gate
β βββ jax_optimizer.py # JAX L-BFGS-B optimizer
β βββ llm_proposer.py # Mock + Gemini + OpenAI proposers
β βββ metrics.py # NMSE, BIC, structural classification
β βββ multivar_orchestrator.py # [Phase 2] Multivariable correction pipeline
β βββ pipeline.py # Stage 1 filter cascade
β βββ real_data_loader.py # Real-world data loading (JPL, NIST, CODATA)
β βββ residual_factorizer_v2.py # [Phase 2] Variance-decomposition separability
β βββ result.py # CorrectionResult object
β βββ sequential_arc.py # [Phase 2] Per-variable Sequential ARC checker
βββ tests/ # 116 unit + integration tests
βββ paper/ # LaTeX source (main.tex) + figures
βββ run_correction_discovery.py # Benchmark runner
βββ README.md # This file
If you use ADCD in your research, please cite:
@software{erdita2026adcd,
author = {Erdita, Muhammad Afif},
title = {{Anomaly-Driven Correction Discovery (ADCD): Physics-Constrained
Symbolic Regression for Evolutionary Scientific Discovery}},
year = {2026},
publisher = {Zenodo},
version = {2.2.1},
doi = {10.5281/zenodo.20534940},
url = {https://doi.org/10.5281/zenodo.20534940}
}Every quantitative claim in this project is reproducible from committed scripts. No number is hand-typed.
# Regenerate the 9-scenario benchmark (seed=42)
python run_correction_discovery.py
# Multi-seed study (5 seeds Γ 9 scenarios Γ 4 noise levels)
python run_reproducibility.py
# Guard: fails loudly if any headline number drifts
python scripts/verify_paper_claims.py
# SPARC MOND robustness study
python -m adcd.experiments.sparc_robustnessThe full test suite (116 tests) must pass before any release:
pytest tests/ -qSee docs/SUBMISSION_CHECKLIST_v2.1.3.md for the end-to-end release procedure.
Transparency matters. This section documents exactly how AI tools were used in the ADCD project, in line with emerging norms for AI-assisted scientific software.
The source code, the accompanying paper, and this documentation were written with assistance from AI assistants (Google DeepMind's Antigravity, and earlier OpenAI/Claude-based coding tools). AI was used as a pair-programming and writing aid:
- Code generation & refactoring β drafting modules, fixing lint errors, generating boilerplate, suggesting type hints.
- Prose editing β improving clarity, grammar, and structure of the paper and docs.
- Debugging β diagnosing stack traces, suggesting fixes for JAX/NumPy numerical issues.
- Code review β catching edge cases, suggesting test coverage improvements.
ADCD supports an LLM-based proposer (src/adcd/llm_proposer.py), which can query a language model (Gemini or OpenAI) to suggest candidate correction templates. This is an opt-in research feature, not the default:
- The default and headline benchmarks use the
mockproposer (deterministic template library), not the LLM proposer. - When the LLM proposer is enabled, its suggestions are still passed through the full physics-gate pipeline (dimensional homogeneity, ARC, BIC). The AI cannot bypass physics validation β every candidate must satisfy the same constraints as any other template.
- AI never runs experiments or computes final benchmark numbers. All quantitative results were generated, verified, and curated by the author.
To be explicit, the following are the sole intellectual contribution of the author (Muhammad Afif Erdita):
- The scientific idea β anomaly-driven correction discovery as opposed to blank-slate symbolic regression.
- The physics-gate pipeline design (cascaded gates, ARC, BIC reranking, Occam's razor).
- All experimental design decisions: scenario selection, noise levels, evaluation protocols, the PySR fair-profile comparison, the SPARC MOND validation protocol.
- Selection and interpretation of real-world benchmarks (Mercury perihelion, Lamb Shift, Muon g-2, Blackbody).
- All claims, conclusions, and limitations discussed in the paper.
Because AI tools can fabricate plausible-looking numbers, every benchmark figure reported in this README and in the paper is regenerable from frozen scripts (run_correction_discovery.py, run_reproducibility.py, scripts/verify_paper_claims.py). The verify_paper_claims.py guard fails loudly if any headline number drifts. Nothing in the headline results was hand-typed from AI output.
If you spot any inaccuracy or have questions about AI use in this project, please open an issue.
This project is licensed under the MIT License.
