Skip to content

cselab/wavegp-public

Repository files navigation

wavegp

Intro

This library implements Cartesian Genetic Programming (CGP), an evolutionary algorithm for symbolic regression and the discovery of computational graphs. Solutions are encoded as fixed-width byte arrays that map to directed acyclic graphs (DAGs). The library ships with:

  • a small Python core (wavegp.py) — random/mutate/build/execute, plus graphviz and binary serialization;
  • a JAX-vectorized engine (wavegp_jax.py) — batched forward, reverse- mode gradients (backward) and forward-mode Jacobians (jacobian) for use inside Gauss-Newton / least-squares fitness loops;
  • CUDA engines under engines/ for large-population searches.

A genome is (g.i + g.n + g.o) rows of 1 + max(g.arity) bytes: the first column is the op type, the rest are slot pointers into earlier slots. Floating-point coefficients live in a parallel params array, g.p per node.

Repository layout

.
├── wavegp.py                      # core library (rand, mutate, execute, build, graphviz)
├── wavegp_jax.py                  # JAX-vectorized forward / backward / jacobian
├── wavegp_lm.py                   # Levenberg-Marquardt wrapper for parameter fitting
├── wavegp_sklearn.py              # sklearn-style CGPRegressor (multi-seed, distributed)
├── demo/                          # numbered demos (run from root)
│   ├── 01_symbolic.py             # symbolic regression (gplearn-style API)
│   ├── 02_haar.py                 # Haar wavelet rediscovery via CGP + GD
│   ├── 03_denoise.py              # wavelet denoising
│   ├── 04_compress.py             # wavelet compression
│   ├── 05_multitap.py             # multi-tap filter
│   ├── 06_multistage.py           # multi-stage lifting
│   ├── 07_poisson.py              # Poisson equation
│   ├── 08_arakawa.py              # Arakawa Jacobian operator
│   ├── 09_kalman.py               # Kalman filter
│   ├── 10_wavegp_c.py             # C extension benchmark
│   ├── 11_cdf53.py                # CDF 5/3 wavelet via GN + (1+1)-ES
│   ├── 12_fd1d.py                 # 1D finite-difference stencil
│   ├── 13_cdf53_lm.py             # CDF 5/3 with LM solver
│   ├── 14_dcgp_p* .. 33_feynman.py  # dCGP benchmark suite + Feynman AI
│   └── {haar,symbolic,sextic_sklearn,pagie1_sklearn}.{py,ipynb}  # Colab demos
├── tests/                         # pytest sanity + golden tests; CUDA cross-validation
├── tools/                         # build/read/write/graphviz; LM solver port
├── engines/                       # JAX, pure-Python, CUDA backends + ops vocabulary
├── 00_amr_wavelet_discovery/      # GA + GN bilevel discovery for AMR error indicators
│   ├── *.py                       #   Python pipeline (lifting wavelet, GN inner solve)
│   └── c/                         #   standalone C reimplementation (1D + 2D + blast)
├── 01_2d_amr/                     # 2D AMR follow-on
├── bench/                         # benchmarks (timings, throughput)
├── cowork/                        # SSH-based distributed worker pool
├── examples/                      # standalone usage examples
├── matlab/                        # MATLAB ports / reference
├── minpack/                       # vendored Fortran MINPACK (LM reference)
├── docs/  img/                    # documentation + figures
└── archive/                       # legacy exploration + scratch

Schema

Every script declares its op set with a small namespace:

class g: pass
g.names = "Plus", "Minus", "Scale", "PredictL", "PredictR", "Update"
g.arity = 2, 2, 1, 2, 2, 2
g.i, g.n, g.o = 2, 6, 2          # inputs, nodes, outputs
g.p = 1             # floats per node (Scale's coefficient)

Building a fixed genome

import wavegp

gen = wavegp.build(
    g,
    ["i0", "Backward_Y", "Backward_X", "Minus", "o0"],
    [(0, 1), (0, 2), (1, 3), (2, 3), (3, 4)],
)

build takes the vertex list, the edge list, and returns the byte genome. Inspect it with wavegp.as_string or wavegp.as_graphviz.

Binary I/O

tools/build.py writes a binary file; tools/read.py dumps it back; tools/graphviz.py produces a .dot graph. Run from the project root with PYTHONPATH pointing at it:

PYTHONPATH=. python tools/build.py    a.raw
PYTHONPATH=. python tools/read.py     a.raw
PYTHONPATH=. python tools/graphviz.py a.raw a.gv
gvpack -u a.gv | dot -Tsvg -o img/a.svg

Gauss-Newton + ES loop (JAX)

demo/11_cdf53.py shows the recommended pattern. Per generation:

  1. wavegp.mutate produces a child population.
  2. wavegp_jax.precompute repacks the byte genomes for the kernel.
  3. A single full Gauss-Newton step params -= lstsq(J, residual) fits each child's continuous coefficients (exact in one step for linear-in-parameter ops).
  4. Children replace parents only if their MSE strictly improves.

demo_wavegp_jax_grad.py verifies that the engine's hand-coded backward (reverse-mode) and jacobian (forward-mode) match jax.grad / jax.jacfwd exactly on a small batch.

CUDA engines

engines/wavegp_engine.cu and engines/wavegp_engine_adf.cu are the production engines used by the phase work in 02_pde_wavelet/ and 03_vorticity_wavelet/. They run one CUDA block per genome with shared (sv, sd, sj) state buffers and diagonal Gauss-Newton inside each block.

Structure

Search algorithm

wavegp uses a search algorithm inspired by the 1 + $\lambda$ evolutionary strategy; the demos use a mutation-only (1+1)-ES with short Gauss-Newton inner loops. The computational graph is encoded as in the following figure (Miller, 2020):

References

About

wavegp: Cartesian Genetic Programming for symbolic regression and wavelet discovery

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors