Skip to content

loschmidt/CascadeMAP-Bayesian

Repository files navigation

Bayesian Optimisation of Experimental Pathways

A Python library for sequential, model-guided optimisation of multi-parameter experiments. The framework fits a Gaussian Process surrogate model to observed measurements and proposes the next most informative experimental conditions to evaluate — minimising the number of experiments needed to find the optimum.

Designed for use cases where experiments are expensive (e.g., laboratory assays, flow chemistry), the library supports both single-point and batch acquisition, simplex-constrained parameters, and TSP-based sorting of measurement sequences to minimise instrument reconfiguration.


Features

  • Gaussian Process regression surrogate model with configurable noise handling
  • Three batch acquisition strategies: Kriging Believer (KB), Local Penalization (LP), and Explore/Exploit decomposition (EE)
  • Two single-point acquisition modes: Expected Improvement (EI) and Upper Confidence Bound (UCB)
  • Simplex-constrained parameter groups (sim_group) for parameters that must sum to a fixed total (e.g., solvent fractions)
  • Discrete parameter bins for quantised variables
  • Priority parameters that can be held fixed across a batch
  • TSP-based output sorting to minimise transitions between measurement points (relevant for lab automation)
  • Halton quasi-random sampling for uniform sampling of parameter space for initial designs
  • Adjustable explore/exploit bias via a single scalar parameter

Repository Structure

.
├── optimize_pathways_library.py   # Core library: all optimisation procedures
├── optimize_pathways_example.py   # Usage examples with simulated fitness
├── example-1d.json                # Minimal 1D parameter config for maximizing the synthetic fitness function y = x*sin(x) 
├── example-2d.json                # Minimal 2D parameter config for maximizing the synthetic fitness function y = -eggholder(x1,x2) 
├── example-4d.json                # Minimal 4D parameter config for maximizing the synthetic fitness function y = -(x1-36.7)^2-(x4-9)^2-eggholder(x2,x3)/150
├── example-5d.json                # Minimal 5D parameter config for maximizing the synthetic fitness function y = KNN(sym_data_artificial)
├── artificial_fitness.py          # Synthetic fitness functions for testing
├── sym_data_artificial.py         # Synthetic fitness values for 5D KNN-based testing
└── README.md

Requirements

numpy
scipy
scikit-learn
matplotlib
python-tsp

Install with:

pip install numpy scipy scikit-learn matplotlib python-tsp

Quick Start

0. Examples how to run

Run optimize_pathways_example.py to see the simplest examples of calls for initial parameter generation and iterative batch generation based on synthetically generated functions.

The instructions below describe individual steps in more details.

1. Define your parameter space (JSON)

Each parameter is an object with the following fields:

Field Type Description
name string Column name used in output CSV
limits [min, max] Search bounds
bins list or null Discretise to these values; null for continuous
sim_group [group_id, total] or null Couples parameters that must sum to total
priority "normal" or "high" Parameters marked "high" are sorted first and can be fixed across a batch
sort_weight float Relative weight when computing distances for TSP sorting

Example — single continuous parameter:

[
    {
        "name": "HRP_(%)",
        "limits": [0.01, 25],
        "bins": null,
        "sim_group": null,
        "priority": "normal",
        "sort_weight": 1
    }
]

For coupled parameters that must sum to a fixed total (e.g., pump rates) and bins see the parameter setups in example-4d.json and example-5d.json.

Notes:

  • Constant parameters (where limits[0] == limits[1]) are handled automatically and excluded from the feature space.
  • sim_group constraints are enforced via a simplex root transform of the Halton sample; upper-limit violations are removed post-sampling.
  • TSP sorting uses simulated annealing (python-tsp) and is seeded for reproducibility.

2. Generate the initial experimental design

from optimize_pathways_library import generate_initial_ratios

p_init = generate_initial_ratios(
    input_instructions='my_params.json',
    n_out=10,
    output_file='out_sorted.csv',
    seed=42,
    sort_output=True
)

This produces the file out_sorted.csv with TSP-sorted parameter combinations sampled via a Halton sequence.

3. Record measurements

Append two columns to the output CSV: the mean and standard deviation (if available) of your measured response for each row. The resulting file is used as training data for each iteration.

4. Generate the next batch

from optimize_pathways_library import generate_next_ratios

p_next = generate_next_ratios(
    input_instructions='my_params.json',
    input_file='out_measured.csv',
    n_out=5,
    output_file='out_next.csv',
    batch_method='LP',
    explore_exploit_bias=0.0,
    use_stdvs=False,
    seed=42
)

Repeat steps 3–4 iteratively until convergence.


API Reference

upload_content(input_instructions)

Parses and validates a JSON parameter config file. Returns (params, sim_groups, out_message). Handles NaN values, inconsistent limits, and malformed sim_group definitions with informative error messages.

generate_initial_ratios(input_instructions, n_out, output_file, debug_flag, seed, sort_output)

Generates an initial space-filling design using a Halton sequence. Optionally sorts the output by solving a Travelling Salesman Problem (TSP) to minimise instrument transitions. Returns an (n_out, n_params) array.

generate_next_ratios(input_instructions, input_file, n_out, output_file, ...)

Core optimisation function. Fits a Gaussian Process to past measurements, evaluates an acquisition function over a large Halton sample, and returns the n_out most promising parameter combinations.

Key arguments:

Argument Default Description
batch_method "KB" Batch strategy: "KB" (Kriging Believer), "LP" (Local Penalization), "EE" (Explore/Exploit)
explore_exploit_bias 0.0 Continuous bias from -1 (pure exploitation) to +1 (pure exploration)
use_stdvs False Whether to pass measurement standard deviations as GP noise (alpha)
fixed_priority_param False If True, the "high"-priority parameter is fixed to the same value across all points in a batch
rand_if_same True Replace duplicates with random samples
sort_output True Apply TSP sorting to the output batch
n_sample 10000 Halton candidate pool size (multiplied by number of parameters internally)

Batch Acquisition Strategies

Kriging Believer (KB)

Selects points sequentially. After each selection, the GP is retrained using the predicted value at the selected point as a "phantom" observation. Suitable for moderate batch sizes.

See Roux, E., Tillier, Y., Kraria, S. & Bouchard, P.-O. An efficient parallel global optimization strategy based on Kriging properties suitable for material parameters identification. Arch. Mech. Eng. 67, 169–195 (2020).

Local Penalization (LP)

Penalises regions around already-selected points by multiplying the acquisition function by a locality-aware decay term derived from the estimated Lipschitz constant. More computationally efficient than KB for larger batches.

See Javier, G., et al. Batch Bayesian optimization via local penalization. Artificial intelligence and statistics. PMLR (2016).

Explore/Exploit decomposition (EE)

Decomposes Expected Improvement into an exploitation term and an exploration term, then selects batch points along a linear schedule from purely exploitative to purely exploratory. The schedule shape is modulated by explore_exploit_bias.

See Sóbester, A., Leary, S. J. & Keane, A. J. On the Design of Optimization Strategies Based on Global Response Surface Approximation Models. J. Glob. Optim. 33, 31–59 (2005).


Output Format

All output files are tab-separated with a header row of parameter names. The training/input file for generate_next_ratios must additionally contain two trailing columns: mean and standard deviation of the measured response.


Example Script

optimize_pathways_example.py demonstrates a full optimisation loop against a synthetic fitness landscape:

  1. Load parameter definitions from a JSON file (example-2d.json)
  2. Generate an initial 5-point Halton design of the parameter space
  3. Evaluate the synthetic fitness function (the inverted eggholder function)
  4. Run 5 sequential optimisation iterations (5 points per iteration, LP batch method)
  5. Track and print the running maximum

To run with your own data, replace the simulate_values calls with your actual measurement results.


License

CC BY-NC 4.0

About

A Python library for sequential, model-guided optimisation of multi-parameter experiments

Resources

License

Stars

Watchers

Forks

Contributors

Languages