A Python library for sequential, model-guided optimisation of multi-parameter experiments. The framework fits a Gaussian Process surrogate model to observed measurements and proposes the next most informative experimental conditions to evaluate — minimising the number of experiments needed to find the optimum.
Designed for use cases where experiments are expensive (e.g., laboratory assays, flow chemistry), the library supports both single-point and batch acquisition, simplex-constrained parameters, and TSP-based sorting of measurement sequences to minimise instrument reconfiguration.
- Gaussian Process regression surrogate model with configurable noise handling
- Three batch acquisition strategies: Kriging Believer (KB), Local Penalization (LP), and Explore/Exploit decomposition (EE)
- Two single-point acquisition modes: Expected Improvement (EI) and Upper Confidence Bound (UCB)
- Simplex-constrained parameter groups (
sim_group) for parameters that must sum to a fixed total (e.g., solvent fractions) - Discrete parameter bins for quantised variables
- Priority parameters that can be held fixed across a batch
- TSP-based output sorting to minimise transitions between measurement points (relevant for lab automation)
- Halton quasi-random sampling for uniform sampling of parameter space for initial designs
- Adjustable explore/exploit bias via a single scalar parameter
.
├── optimize_pathways_library.py # Core library: all optimisation procedures
├── optimize_pathways_example.py # Usage examples with simulated fitness
├── example-1d.json # Minimal 1D parameter config for maximizing the synthetic fitness function y = x*sin(x)
├── example-2d.json # Minimal 2D parameter config for maximizing the synthetic fitness function y = -eggholder(x1,x2)
├── example-4d.json # Minimal 4D parameter config for maximizing the synthetic fitness function y = -(x1-36.7)^2-(x4-9)^2-eggholder(x2,x3)/150
├── example-5d.json # Minimal 5D parameter config for maximizing the synthetic fitness function y = KNN(sym_data_artificial)
├── artificial_fitness.py # Synthetic fitness functions for testing
├── sym_data_artificial.py # Synthetic fitness values for 5D KNN-based testing
└── README.md
numpy
scipy
scikit-learn
matplotlib
python-tsp
Install with:
pip install numpy scipy scikit-learn matplotlib python-tspRun optimize_pathways_example.py to see the simplest examples of calls for initial parameter generation and iterative batch generation based on synthetically generated functions.
The instructions below describe individual steps in more details.
Each parameter is an object with the following fields:
| Field | Type | Description |
|---|---|---|
name |
string | Column name used in output CSV |
limits |
[min, max] | Search bounds |
bins |
list or null | Discretise to these values; null for continuous |
sim_group |
[group_id, total] or null | Couples parameters that must sum to total |
priority |
"normal" or "high" |
Parameters marked "high" are sorted first and can be fixed across a batch |
sort_weight |
float | Relative weight when computing distances for TSP sorting |
Example — single continuous parameter:
[
{
"name": "HRP_(%)",
"limits": [0.01, 25],
"bins": null,
"sim_group": null,
"priority": "normal",
"sort_weight": 1
}
]For coupled parameters that must sum to a fixed total (e.g., pump rates) and bins see the parameter setups in example-4d.json and example-5d.json.
Notes:
- Constant parameters (where
limits[0] == limits[1]) are handled automatically and excluded from the feature space. sim_groupconstraints are enforced via a simplex root transform of the Halton sample; upper-limit violations are removed post-sampling.- TSP sorting uses simulated annealing (
python-tsp) and is seeded for reproducibility.
from optimize_pathways_library import generate_initial_ratios
p_init = generate_initial_ratios(
input_instructions='my_params.json',
n_out=10,
output_file='out_sorted.csv',
seed=42,
sort_output=True
)This produces the file out_sorted.csv with TSP-sorted parameter combinations sampled via a Halton sequence.
Append two columns to the output CSV: the mean and standard deviation (if available) of your measured response for each row. The resulting file is used as training data for each iteration.
from optimize_pathways_library import generate_next_ratios
p_next = generate_next_ratios(
input_instructions='my_params.json',
input_file='out_measured.csv',
n_out=5,
output_file='out_next.csv',
batch_method='LP',
explore_exploit_bias=0.0,
use_stdvs=False,
seed=42
)Repeat steps 3–4 iteratively until convergence.
Parses and validates a JSON parameter config file. Returns (params, sim_groups, out_message). Handles NaN values, inconsistent limits, and malformed sim_group definitions with informative error messages.
Generates an initial space-filling design using a Halton sequence. Optionally sorts the output by solving a Travelling Salesman Problem (TSP) to minimise instrument transitions. Returns an (n_out, n_params) array.
Core optimisation function. Fits a Gaussian Process to past measurements, evaluates an acquisition function over a large Halton sample, and returns the n_out most promising parameter combinations.
Key arguments:
| Argument | Default | Description |
|---|---|---|
batch_method |
"KB" |
Batch strategy: "KB" (Kriging Believer), "LP" (Local Penalization), "EE" (Explore/Exploit) |
explore_exploit_bias |
0.0 |
Continuous bias from -1 (pure exploitation) to +1 (pure exploration) |
use_stdvs |
False |
Whether to pass measurement standard deviations as GP noise (alpha) |
fixed_priority_param |
False |
If True, the "high"-priority parameter is fixed to the same value across all points in a batch |
rand_if_same |
True |
Replace duplicates with random samples |
sort_output |
True |
Apply TSP sorting to the output batch |
n_sample |
10000 |
Halton candidate pool size (multiplied by number of parameters internally) |
Selects points sequentially. After each selection, the GP is retrained using the predicted value at the selected point as a "phantom" observation. Suitable for moderate batch sizes.
See Roux, E., Tillier, Y., Kraria, S. & Bouchard, P.-O. An efficient parallel global optimization strategy based on Kriging properties suitable for material parameters identification. Arch. Mech. Eng. 67, 169–195 (2020).
Penalises regions around already-selected points by multiplying the acquisition function by a locality-aware decay term derived from the estimated Lipschitz constant. More computationally efficient than KB for larger batches.
See Javier, G., et al. Batch Bayesian optimization via local penalization. Artificial intelligence and statistics. PMLR (2016).
Decomposes Expected Improvement into an exploitation term and an exploration term, then selects batch points along a linear schedule from purely exploitative to purely exploratory. The schedule shape is modulated by explore_exploit_bias.
See Sóbester, A., Leary, S. J. & Keane, A. J. On the Design of Optimization Strategies Based on Global Response Surface Approximation Models. J. Glob. Optim. 33, 31–59 (2005).
All output files are tab-separated with a header row of parameter names. The training/input file for generate_next_ratios must additionally contain two trailing columns: mean and standard deviation of the measured response.
optimize_pathways_example.py demonstrates a full optimisation loop against a synthetic fitness landscape:
- Load parameter definitions from a JSON file (example-2d.json)
- Generate an initial 5-point Halton design of the parameter space
- Evaluate the synthetic fitness function (the inverted eggholder function)
- Run 5 sequential optimisation iterations (5 points per iteration, LP batch method)
- Track and print the running maximum
To run with your own data, replace the simulate_values calls with your actual measurement results.