Bayesian Optimisation of Experimental Pathways

A Python library for sequential, model-guided optimisation of multi-parameter experiments. The framework fits a Gaussian Process surrogate model to observed measurements and proposes the next most informative experimental conditions to evaluate — minimising the number of experiments needed to find the optimum.

Designed for use cases where experiments are expensive (e.g., laboratory assays, flow chemistry), the library supports both single-point and batch acquisition, simplex-constrained parameters, and TSP-based sorting of measurement sequences to minimise instrument reconfiguration.

Features

Gaussian Process regression surrogate model with configurable noise handling
Three batch acquisition strategies: Kriging Believer (KB), Local Penalization (LP), and Explore/Exploit decomposition (EE)
Two single-point acquisition modes: Expected Improvement (EI) and Upper Confidence Bound (UCB)
Simplex-constrained parameter groups (sim_group) for parameters that must sum to a fixed total (e.g., solvent fractions)
Discrete parameter bins for quantised variables
Priority parameters that can be held fixed across a batch
TSP-based output sorting to minimise transitions between measurement points (relevant for lab automation)
Halton quasi-random sampling for uniform sampling of parameter space for initial designs
Adjustable explore/exploit bias via a single scalar parameter

Repository Structure

.
├── optimize_pathways_library.py   # Core library: all optimisation procedures
├── optimize_pathways_example.py   # Usage examples with simulated fitness
├── example-1d.json                # Minimal 1D parameter config for maximizing the synthetic fitness function y = x*sin(x) 
├── example-2d.json                # Minimal 2D parameter config for maximizing the synthetic fitness function y = -eggholder(x1,x2) 
├── example-4d.json                # Minimal 4D parameter config for maximizing the synthetic fitness function y = -(x1-36.7)^2-(x4-9)^2-eggholder(x2,x3)/150
├── example-5d.json                # Minimal 5D parameter config for maximizing the synthetic fitness function y = KNN(sym_data_artificial)
├── artificial_fitness.py          # Synthetic fitness functions for testing
├── sym_data_artificial.py         # Synthetic fitness values for 5D KNN-based testing
└── README.md

Requirements

numpy
scipy
scikit-learn
matplotlib
python-tsp

Install with:

pip install numpy scipy scikit-learn matplotlib python-tsp

Quick Start

0. Examples how to run

Run optimize_pathways_example.py to see the simplest examples of calls for initial parameter generation and iterative batch generation based on synthetically generated functions.

The instructions below describe individual steps in more details.

1. Define your parameter space (JSON)

Each parameter is an object with the following fields:

Field	Type	Description
`name`	string	Column name used in output CSV
`limits`	[min, max]	Search bounds
`bins`	list or null	Discretise to these values; `null` for continuous
`sim_group`	[group_id, total] or null	Couples parameters that must sum to `total`
`priority`	`"normal"` or `"high"`	Parameters marked `"high"` are sorted first and can be fixed across a batch
`sort_weight`	float	Relative weight when computing distances for TSP sorting

Example — single continuous parameter:

[
    {
        "name": "HRP_(%)",
        "limits": [0.01, 25],
        "bins": null,
        "sim_group": null,
        "priority": "normal",
        "sort_weight": 1
    }
]

For coupled parameters that must sum to a fixed total (e.g., pump rates) and bins see the parameter setups in example-4d.json and example-5d.json.

Notes:

Constant parameters (where limits[0] == limits[1]) are handled automatically and excluded from the feature space.
sim_group constraints are enforced via a simplex root transform of the Halton sample; upper-limit violations are removed post-sampling.
TSP sorting uses simulated annealing (python-tsp) and is seeded for reproducibility.

2. Generate the initial experimental design

from optimize_pathways_library import generate_initial_ratios

p_init = generate_initial_ratios(
    input_instructions='my_params.json',
    n_out=10,
    output_file='out_sorted.csv',
    seed=42,
    sort_output=True
)

This produces the file out_sorted.csv with TSP-sorted parameter combinations sampled via a Halton sequence.

3. Record measurements

Append two columns to the output CSV: the mean and standard deviation (if available) of your measured response for each row. The resulting file is used as training data for each iteration.

4. Generate the next batch

from optimize_pathways_library import generate_next_ratios

p_next = generate_next_ratios(
    input_instructions='my_params.json',
    input_file='out_measured.csv',
    n_out=5,
    output_file='out_next.csv',
    batch_method='LP',
    explore_exploit_bias=0.0,
    use_stdvs=False,
    seed=42
)

Repeat steps 3–4 iteratively until convergence.

API Reference

`upload_content(input_instructions)`

Parses and validates a JSON parameter config file. Returns (params, sim_groups, out_message). Handles NaN values, inconsistent limits, and malformed sim_group definitions with informative error messages.

`generate_initial_ratios(input_instructions, n_out, output_file, debug_flag, seed, sort_output)`

Generates an initial space-filling design using a Halton sequence. Optionally sorts the output by solving a Travelling Salesman Problem (TSP) to minimise instrument transitions. Returns an (n_out, n_params) array.

`generate_next_ratios(input_instructions, input_file, n_out, output_file, ...)`

Core optimisation function. Fits a Gaussian Process to past measurements, evaluates an acquisition function over a large Halton sample, and returns the n_out most promising parameter combinations.

Key arguments:

Argument	Default	Description
`batch_method`	`"KB"`	Batch strategy: `"KB"` (Kriging Believer), `"LP"` (Local Penalization), `"EE"` (Explore/Exploit)
`explore_exploit_bias`	`0.0`	Continuous bias from `-1` (pure exploitation) to `+1` (pure exploration)
`use_stdvs`	`False`	Whether to pass measurement standard deviations as GP noise (`alpha`)
`fixed_priority_param`	`False`	If `True`, the `"high"`-priority parameter is fixed to the same value across all points in a batch
`rand_if_same`	`True`	Replace duplicates with random samples
`sort_output`	`True`	Apply TSP sorting to the output batch
`n_sample`	`10000`	Halton candidate pool size (multiplied by number of parameters internally)

Batch Acquisition Strategies

Kriging Believer (`KB`)

Selects points sequentially. After each selection, the GP is retrained using the predicted value at the selected point as a "phantom" observation. Suitable for moderate batch sizes.

See Roux, E., Tillier, Y., Kraria, S. & Bouchard, P.-O. An efficient parallel global optimization strategy based on Kriging properties suitable for material parameters identification. Arch. Mech. Eng. 67, 169–195 (2020).

Local Penalization (`LP`)

Penalises regions around already-selected points by multiplying the acquisition function by a locality-aware decay term derived from the estimated Lipschitz constant. More computationally efficient than KB for larger batches.

See Javier, G., et al. Batch Bayesian optimization via local penalization. Artificial intelligence and statistics. PMLR (2016).

Explore/Exploit decomposition (`EE`)

Decomposes Expected Improvement into an exploitation term and an exploration term, then selects batch points along a linear schedule from purely exploitative to purely exploratory. The schedule shape is modulated by explore_exploit_bias.

See Sóbester, A., Leary, S. J. & Keane, A. J. On the Design of Optimization Strategies Based on Global Response Surface Approximation Models. J. Glob. Optim. 33, 31–59 (2005).

Output Format

All output files are tab-separated with a header row of parameter names. The training/input file for generate_next_ratios must additionally contain two trailing columns: mean and standard deviation of the measured response.

Example Script

optimize_pathways_example.py demonstrates a full optimisation loop against a synthetic fitness landscape:

Load parameter definitions from a JSON file (example-2d.json)
Generate an initial 5-point Halton design of the parameter space
Evaluate the synthetic fitness function (the inverted eggholder function)
Run 5 sequential optimisation iterations (5 points per iteration, LP batch method)
Track and print the running maximum

To run with your own data, replace the simulate_values calls with your actual measurement results.

License

CC BY-NC 4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bayesian Optimisation of Experimental Pathways

Features

Repository Structure

Requirements

Quick Start

0. Examples how to run

1. Define your parameter space (JSON)

2. Generate the initial experimental design

3. Record measurements

4. Generate the next batch

API Reference

`upload_content(input_instructions)`

`generate_initial_ratios(input_instructions, n_out, output_file, debug_flag, seed, sort_output)`

`generate_next_ratios(input_instructions, input_file, n_out, output_file, ...)`

Batch Acquisition Strategies

Kriging Believer (`KB`)

Local Penalization (`LP`)

Explore/Exploit decomposition (`EE`)

Output Format

Example Script

License

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
LICENSE		LICENSE
README.md		README.md
artificial_fitness.py		artificial_fitness.py
example-1d.json		example-1d.json
example-2d.json		example-2d.json
example-4d.json		example-4d.json
example-5d.json		example-5d.json
optimize_pathways_example.py		optimize_pathways_example.py
optimize_pathways_library.py		optimize_pathways_library.py
sym_data_artificial.txt		sym_data_artificial.txt

Folders and files

Latest commit

History

Repository files navigation

Bayesian Optimisation of Experimental Pathways

Features

Repository Structure

Requirements

Quick Start

0. Examples how to run

1. Define your parameter space (JSON)

2. Generate the initial experimental design

3. Record measurements

4. Generate the next batch

API Reference

upload_content(input_instructions)

generate_initial_ratios(input_instructions, n_out, output_file, debug_flag, seed, sort_output)

generate_next_ratios(input_instructions, input_file, n_out, output_file, ...)

Batch Acquisition Strategies

Kriging Believer (KB)

Local Penalization (LP)

Explore/Exploit decomposition (EE)

Output Format

Example Script

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

`upload_content(input_instructions)`

`generate_initial_ratios(input_instructions, n_out, output_file, debug_flag, seed, sort_output)`

`generate_next_ratios(input_instructions, input_file, n_out, output_file, ...)`

Kriging Believer (`KB`)

Local Penalization (`LP`)

Explore/Exploit decomposition (`EE`)