Skip to content

apiprdt/PhysicsPaper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

104 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

ADCD β€” Anomaly-Driven Correction Discovery

Physics-constrained symbolic regression that discovers correction terms β€” not equations from scratch. The same logic that led from Newton to Einstein, from Rayleigh–Jeans to Planck.

ADCD discovery pipeline animation

β–Ά Download video (MP4)

CI Status PyPI Version DOI License: MIT Tests Status Python Support

πŸ“– Documentation Β Β·Β  ⚑ Quick Start Β Β·Β  πŸ“Š Benchmarks Β Β·Β  β–Ά Run in Colab

Science rarely discovers from a blank slate β€” it corrects. ADCD automates the step between anomaly and theory correction: given a classical law and data that disagrees with it, it searches for the minimal physically-valid correction term $\Delta$ β€” passing every candidate through dimensional, asymptotic, and complexity gates before a single parameter is ever fit.


⚑ Key Features

  • Correction-First Paradigm β€” Starts from a known classical law, not a blank slate. Focuses the search space on the discrepancy $\Delta$ between theory and experiment.
  • Cascaded Physics Gates β€” AST complexity, dimensional homogeneity, transcendental guardrails, and asymptotic consistency (ARC) gates screen out unphysical candidates before running parameter-fitting.
  • JAX-Traced L-BFGS-B Optimizer β€” Highly optimized parameter-scaled differentiable fitting with multi-restart log-uniform initialization.
  • BIC Model Selection β€” Employs the Bayesian Information Criterion (BIC) to rank models, favoring simpler physical theories over overly complex numerical fits.
  • Residual Feature Intelligence β€” Extracts mathematical features (monotonicity, curvature, oscillation, decay) from residuals to bias proposal templates.
  • Phase 2: Multivariable Discovery β€” Buckingham Ξ  group decomposition + per-variable Sequential ARC + variance-factorization separability detection for multi-input physical laws.
  • Real-World Validated β€” Successfully identifies correct structural classes on Mercury's perihelion (GR), Lamb Shift (QED), Muon g-2 (Schwinger), and Blackbody (Planck).

πŸ“¦ Installation

Install the stable package from PyPI:

pip install adcd

Or install from source:

git clone https://github.com/apiprdt/PhysicsPaper.git
cd PhysicsPaper
pip install -e ".[dev]"

Verify your installation:

pytest tests/

πŸ’» Quick Start

1. High-Level Scientific API

Running ADCD on predefined physics benchmarks is extremely simple:

import adcd

# 1. Load a pre-defined benchmark scenario (e.g. Relativistic Kinetic Energy)
scenarios = adcd.get_all_scenarios()
scenario = scenarios[0] 

# 2. Run discovery in a single line!
result = adcd.discover_correction(scenario, max_iterations=5, proposer="mock")

# 3. View the best fit
print(f"Discovered correction: {result.best_expr}")       # ΞΈβ‚€ * (v/c)**2
print(f"LaTeX representation:  {result.export_latex()}")   # \theta_0 \left(\frac{v}{c}\right)^2
print(f"Parameters:            {result.best_theta}")
print(f"BIC Score:             {result.best_bic:.2f}")

# 4. Plot residuals
result.plot_residuals()

2. Custom Experimental Datasets

For custom datasets, use the adcd.fit function:

import numpy as np
import adcd

# Your custom data
x = np.linspace(1.0, 5.0, 100)
X = {"x": x}
y_classical = 2.0 * x
y_observed  = 2.0 * x + 0.5 * x**2   # True correction is 0.5 * x^2

# Run ADCD
result = adcd.fit(
    X=X,
    y_obs=y_observed,
    y_classical=y_classical,
    limit_variable="x",
    limit_direction="0",
    correction_mode="additive"
)

result.summary()

πŸ“Š Benchmark Results

1. Standard Benchmark (seed=42, Mock Proposer)

Scenario Tier 0% Noise 1% Noise 5% Noise 10% Noise
Relativistic KE Textbook βœ“ βœ“ βœ“ βœ“
Yukawa Gravity Textbook βœ“ βœ“ βœ“ βœ“
Anharmonic Spring Textbook βœ“ βœ“ βœ“ βœ“
Screened Coulomb Cross-Domain βœ“ βœ“ βœ— βœ—
Net Radiation Cross-Domain βœ“ βœ“ βœ“ βœ“
Nonlinear Drag Cross-Domain βœ“ βœ“ βœ“ βœ“
Mystery-A (tanhΒ²) Synthetic βœ“ βœ“ βœ“ βœ“
Mystery-B (sinc) Synthetic βœ“ βœ“ βœ“ βœ“
Mystery-C (log-quotient) Synthetic βœ“ βœ“ βœ“ βœ“
Overall 100% 100% 88.9% 88.9%

2. PySR Comparison (fair profile: 100 iterations, 60s timeout)

Method 0% Noise 1% Noise 5% Noise 10% Noise
ADCD (ours, seed=42) 9/9 (100%) 9/9 (100%) 8/9 (88.9%) 8/9 (88.9%)
PySR fair 4/9 (44.4%) 5/9 (55.6%) 1/9 (11.1%) 5/9 (55.6%)

ADCD outperforms PySR by +77.8 percentage points at 5% noise.

3. Phase 2: Multivariable Benchmark (v2.2.1)

Scenario Variables ADCD Solved Notes
Yukawa Mass-Ratio m, M, r, rβ‚€ βœ“ Ξ  groups: m/M, r/rβ‚€
Turbulent Drag v, ρ, A, C_D βœ“ Separable multiplicative
Coupled Oscillator k, m, Ξ©, Ο‰β‚€ βœ— Mixed functional form
Van der Waals MV a, b, P, V, T βœ— Requires 3rd Ξ  group
Overall 2/4 (50%) Baseline: 0/4

4. Real-World Physical Constants

Validation on historical anomalies using physical constants from JPL DE440, NIST, and CODATA:

Physical Scenario Discovered Correction Converged Class Match NMSE
Mercury Perihelion (GR) ΞΈβ‚€Β·vcΒ² β€” βœ“ polynomial 1.11e-05
Hydrogen Lamb Shift (QED) ΞΈβ‚€(n/θ₁)^(-ΞΈβ‚‚) βœ“ βœ“ power_law 1.82e-18
Muon g-2 (Schwinger) ΞΈβ‚€(Ξ±/Ο€)^θ₁ βœ“ βœ“ polynomial 7.94e-07
Blackbody (Planck) -1 + e^(-f/θ₁) β€” βœ“ exponential 2.59e-02

πŸ“ Project Structure

PhysicsPaper/
β”œβ”€β”€ src/adcd/                       # Installable package
β”‚   β”œβ”€β”€ __init__.py                 # Public API (fit, discover_correction)
β”‚   β”œβ”€β”€ anomaly_scenarios.py        # 9 standard + 3 blind + 4 multivariable scenarios
β”‚   β”œβ”€β”€ arc_scorer.py               # Asymptotic consistency gate (ARC)
β”‚   β”œβ”€β”€ buckingham_pi.py            # [Phase 2] Buckingham Ξ  group engine
β”‚   β”œβ”€β”€ coarse_evaluator.py         # Coarse numerical pre-filter
β”‚   β”œβ”€β”€ correction_orchestrator.py  # Main multi-iteration discovery loop
β”‚   β”œβ”€β”€ dimensional_checker.py      # Dimensional homogeneity + transcendental gate
β”‚   β”œβ”€β”€ jax_optimizer.py            # JAX L-BFGS-B optimizer
β”‚   β”œβ”€β”€ llm_proposer.py             # Mock + Gemini + OpenAI proposers
β”‚   β”œβ”€β”€ metrics.py                  # NMSE, BIC, structural classification
β”‚   β”œβ”€β”€ multivar_orchestrator.py    # [Phase 2] Multivariable correction pipeline
β”‚   β”œβ”€β”€ pipeline.py                 # Stage 1 filter cascade
β”‚   β”œβ”€β”€ real_data_loader.py         # Real-world data loading (JPL, NIST, CODATA)
β”‚   β”œβ”€β”€ residual_factorizer_v2.py   # [Phase 2] Variance-decomposition separability
β”‚   β”œβ”€β”€ result.py                   # CorrectionResult object
β”‚   └── sequential_arc.py           # [Phase 2] Per-variable Sequential ARC checker
β”œβ”€β”€ tests/                          # 116 unit + integration tests
β”œβ”€β”€ paper/                          # LaTeX source (main.tex) + figures
β”œβ”€β”€ run_correction_discovery.py     # Benchmark runner
└── README.md                       # This file

πŸ“– Citing This Work

If you use ADCD in your research, please cite:

@software{erdita2026adcd,
  author    = {Erdita, Muhammad Afif},
  title     = {{Anomaly-Driven Correction Discovery (ADCD): Physics-Constrained
                Symbolic Regression for Evolutionary Scientific Discovery}},
  year      = {2026},
  publisher = {Zenodo},
  version   = {2.2.1},
  doi       = {10.5281/zenodo.20534940},
  url       = {https://doi.org/10.5281/zenodo.20534940}
}

πŸ”¬ Reproducibility

Every quantitative claim in this project is reproducible from committed scripts. No number is hand-typed.

# Regenerate the 9-scenario benchmark (seed=42)
python run_correction_discovery.py

# Multi-seed study (5 seeds Γ— 9 scenarios Γ— 4 noise levels)
python run_reproducibility.py

# Guard: fails loudly if any headline number drifts
python scripts/verify_paper_claims.py

# SPARC MOND robustness study
python -m adcd.experiments.sparc_robustness

The full test suite (116 tests) must pass before any release:

pytest tests/ -q

See docs/SUBMISSION_CHECKLIST_v2.1.3.md for the end-to-end release procedure.


πŸ‘₯ AI Disclosure & Responsible Use

Transparency matters. This section documents exactly how AI tools were used in the ADCD project, in line with emerging norms for AI-assisted scientific software.

Authoring assistance

The source code, the accompanying paper, and this documentation were written with assistance from AI assistants (Google DeepMind's Antigravity, and earlier OpenAI/Claude-based coding tools). AI was used as a pair-programming and writing aid:

  • Code generation & refactoring β€” drafting modules, fixing lint errors, generating boilerplate, suggesting type hints.
  • Prose editing β€” improving clarity, grammar, and structure of the paper and docs.
  • Debugging β€” diagnosing stack traces, suggesting fixes for JAX/NumPy numerical issues.
  • Code review β€” catching edge cases, suggesting test coverage improvements.

AI as an optional discovery backend (not a co-author)

ADCD supports an LLM-based proposer (src/adcd/llm_proposer.py), which can query a language model (Gemini or OpenAI) to suggest candidate correction templates. This is an opt-in research feature, not the default:

  • The default and headline benchmarks use the mock proposer (deterministic template library), not the LLM proposer.
  • When the LLM proposer is enabled, its suggestions are still passed through the full physics-gate pipeline (dimensional homogeneity, ARC, BIC). The AI cannot bypass physics validation β€” every candidate must satisfy the same constraints as any other template.
  • AI never runs experiments or computes final benchmark numbers. All quantitative results were generated, verified, and curated by the author.

What is not AI-generated

To be explicit, the following are the sole intellectual contribution of the author (Muhammad Afif Erdita):

  • The scientific idea β€” anomaly-driven correction discovery as opposed to blank-slate symbolic regression.
  • The physics-gate pipeline design (cascaded gates, ARC, BIC reranking, Occam's razor).
  • All experimental design decisions: scenario selection, noise levels, evaluation protocols, the PySR fair-profile comparison, the SPARC MOND validation protocol.
  • Selection and interpretation of real-world benchmarks (Mercury perihelion, Lamb Shift, Muon g-2, Blackbody).
  • All claims, conclusions, and limitations discussed in the paper.

Reproducibility safeguard

Because AI tools can fabricate plausible-looking numbers, every benchmark figure reported in this README and in the paper is regenerable from frozen scripts (run_correction_discovery.py, run_reproducibility.py, scripts/verify_paper_claims.py). The verify_paper_claims.py guard fails loudly if any headline number drifts. Nothing in the headline results was hand-typed from AI output.

If you spot any inaccuracy or have questions about AI use in this project, please open an issue.


πŸ“„ License

This project is licensed under the MIT License.