MLPermutationSolver is a GPU-native framework for solving permutation sorting problems on Cayley graphs with beam search, learned scoring models, and puzzle-specific heuristics. It currently supports:
- LRX puzzle: permutation sorting on a Cayley graph generated by
three moves —
L(left shift),R(right shift), andX(swap of the first two elements). - Pancake puzzle: prefix-reversal pancake sorting with moves
Rk, which reverse the firstkelements.
This repo is a spinoff from CayleyPy by Alexander Chervov and collaborators (applying artificial intelligence methods to the mathematical problem of Cayley graph pathfinding). It represents an independent implementation of the Beam search solvers for testing, the results of which were reported in academic publications e.g. here.
- Random walk data generation algorithms for training ML models
- Beam search solver
BeamSolverwith ML guidance and backward modes (backward_mode="off" | "bfs" | "beam") - Multiple model architectures (XGBoost, CatBoost, MLP)
- GPU-optimized implementation: the search hot path, state hashing, meet-in-the-middle lookups, and model inference are designed to stay on GPU
- Benchmarking, profiling, and evaluation tools
CUDA-capable GPU is required; CPU execution is not supported. uv is used for environment management.
git clone https://github.com/T0chka/MLPermutationSolver.git
cd MLPermutationSolver
uv venv
uv syncTo run the full pipeline end-to-end (generate random-walk training data, train an ML model,
and then solve target permutations with BeamSolver using model-guided scoring)
use solve_puzzle.py — configurable in the script header via InputConfig:
input.state = [1,0,2,...]— solve this specific permutationinput.state = None— generate a random permutation (useinput.random_state_sizeandinput.random_seed)
uv run solve_puzzle.pyMLPermutationSolver/
├── src/
│ ├── data_gen/ # Random walk data generation, BFS distances
│ ├── models/ # ML model implementations
│ └── solvers/ # Beam search solvers
├── experiments/ # Benchmarking and analysis scripts
├── pyproject.toml # Project config and dependencies (uv)
├── uv.lock # Locked dependency versions
└── README.md # This file
The experiments/ directory contains scripts for:
- Benchmarking random walk implementations
- Optimizing hyperparameters
- Profiling model and solver performance
See experiments/README_exp.md for detailed descriptions of each script and their parameters.
Numba-optimized BFS over the full permutation Cayley graph for any puzzle (generators passed in). Computes exact distances from the identity to every reachable state. Memory-bound for LRX puzzle: max n=13 on RTX 3090 with 24 GB VRAM / 124 GB RAM (~70 GiB for n=13). Typical runtimes: n=12 ~2 min, n=13 ~35 min.
All dataset generators simulate sequences of states (permutations) starting from the identity permutation. They output (X, y), where X is a batch of states and y is a step index used as a training target / proxy for "search depth".
-
first_visit_random_walks Unconstrained random walks. Repeats are allowed. After sampling, each unique state is assigned y = the earliest step at which the state appeared in the whole sample ("first-visit time"). Therefore y is a function of state only, and (state, y) pairs collapse to unique states. This generator is very fast, but can waste a large fraction of the sampling budget on duplicates for larger state_size.
-
nbt_random_walks Per-walk self-avoiding (non-backtracking) random walks. Each walk tracks the set of states it has visited and forbids transitions to already visited states within that walk. If a walk has no valid move, it terminates early. The same state may appear at different steps across different walks, so y is not a pure function of state. This generator is significantly slower due to per-walk history checks.
-
random_walks_beam_nbt Beam-style sampling with global non-backtracking in a sliding history window. At each step, expand the current beam of states by all moves, discard states present in the recent global hash history, then randomly select a new beam of size n_walks from the remaining candidates. The target uses an "effective step" (counter of successful expansions); when the state-space saturates, effective step may stop increasing even if the loop continues. This variant typically maximizes unique-state coverage per fixed sampling budget and is much faster than per-walk self-avoidance.
The solver treats a permutation state as a feature vector (the raw sequence of integers) and learns a scalar score y used as a proxy for "how deep / hard" a state is in the search graph.
All models implement BaseModel with four methods:
- train(X, y): fit on a batch of states X and scalar targets y
- predict(X): return a 1D tensor of scores for states X
- save(path), load(path): persist and restore a trained model
-
XGBoostModel (src/models/xgboost_model.py) Gradient-boosted trees trained and inferred fully on GPU. PyTorch CUDA tensors are converted to CuPy via DLPack, XGBoost uses a GPU DMatrix for training, and inference uses inplace_predict(). Predictions are returned as a CUDA torch.Tensor via DLPack zero-copy.
-
CatBoostModel (src/models/catboost_model.py) CatBoostRegressor configured for GPU training. However, this wrapper currently moves tensors to CPU (X.cpu().numpy(), y.cpu().numpy()) for both training and prediction, then wraps predictions back into a torch tensor.
-
MLPModel (src/models/mlp_model.py) A feed-forward neural network (Linear + BatchNorm + ReLU + Dropout blocks) trained with Adam and MSE loss. This is a straightforward regression baseline that runs on the same device as X. The checkpoint includes both model and optimizer state, plus the architecture/training hyperparameters needed to reconstruct the network on load.
- Unified approximate solver
BeamSolver- Forward beam search guided by ML scores
backward_mode="off"— pure forward beambackward_mode="bfs"— tensor-only backward BFS archive on GPU, then forward beam with meet lookups in the frozen archive and path improvement when a better meeting path is foundbackward_mode="beam"— full bidirectional beam with live backward frontier and meet-in-the-middle path improvement on both sides- Solution objective (
SolutionObjective) tracks the best known solution and is designed to support future objectives beyond shortest path
- Pruning and bounds
- Lower-bound heuristic from the adapter combined with a dynamic path-length
limit (
max_stepsand current best solution length) aggressively prunes branches that provably cannot beat the current best solution
- Lower-bound heuristic from the adapter combined with a dynamic path-length
limit (
- Hash-based infrastructure
HashHistory: sliding-window hash history on GPU for global non-backtrackingMeetArchive+MeetLookup: cached, sorted hash index for fast meet-in-the-middle lookups
- Path reconstruction
PathAssembler: pure-tensor path reconstruction from parent/move archives
- Search scheduling
SearchScheduler/IndependentFrontierSchedulerlayer that chooses which frontier (forward / backward) to expand, ready for future NBS / pair-based schedulers
- Exact solver (pancake only)
PancakeExactSolverwith pluggable incumbent solver selected via solver registry
All hot-path pieces (expansion kernel, hash history, meet index, path assembly, lower bounds, beam pruning) are implemented as GPU tensor operations.
- Solver runtime profiling (
experiments/profile_solver.py)- Coarse metrics: total search time, CUDA memory, states explored, solution length
- Stage-level timings for key solver stages:
expand,unique,history_filter,lower_bound,beam_prune,meet_lookup,bfs_prebuild - Per-stage counters: number of calls and total input states processed
_SolverProfilerhelper insideBeamSolver- GPU synchronization only around total
search_time(for accurate wall-clock timings), with optional stage profiling guarded by a runtime flag - Clean context-manager API (
with profiler.section("expand", ...)) so thatsolve()and BFS prebuild code do not contain manual timer bookkeeping
- GPU synchronization only around total
- Additional solution objectives: Generalize
SolutionObjectivebeyondobjective="shortest"to support alternative criteria such as preferring longer or lexicographically-structured solutions, while keeping the same API (solution_better_than,update_best). - Faster LRX lower bound computation: The current admissible lower bound for LRX uses a max of two terms: an inversion-based bound and a cyclic displacement bound. The inversion term is still computed in O(n^2) time via temporary comparison tensors. Replace it with a more efficient exact implementation (e.g. merge-sort or Fenwick-tree based) without changing the resulting lower-bound values.
- Hybrid ML + heuristic distance estimates: unify model-based scores with analytic lower bounds (gap / inversions) into a single calibrated estimate of distance-to-go, to improve ranking quality and pruning.
- Diversity buckets for tie-break: When pruning the beam, many states can have the same score (tie). Currently ties are broken by lookahead (productive chain) and/or random. An idea: use diversity buckets to spread selection among states with the same score, so the beam keeps more varied states instead of clustering.
This work builds upon and contributes to research in AI-based approaches to Cayley graph pathfinding. Results using these methods have been reported in:
- Chervov, A., et al. (2025). "CayleyPy RL: Pathfinding and Reinforcement Learning on Cayley Graphs." arXiv:2502.18663
This project is licensed under the MIT License. See the LICENSE file for details.
If you use this work in your research, please cite:
@software{mlpermutationsolver,
title={MLPermutationSolver: Machine Learning Approach to Permutation Sorting},
author={Dolgorukova, Antonina},
year={2024},
url={https://github.com/T0chka/MLPermutationSolver}
}
@article{chervov2025cayleypy,
title={CayleyPy RL: Pathfinding and Reinforcement Learning on Cayley Graphs},
author={Chervov, A. and Soibelman, A. and Lytkin, S. and others},
journal={arXiv preprint arXiv:2502.18663},
year={2025}
}For questions and collaboration:
- Email: an.dolgorukova@gmail.com
- GitHub: @T0chka