MLPermutationSolver

MLPermutationSolver is a GPU-native framework for solving permutation sorting problems on Cayley graphs with beam search, learned scoring models, and puzzle-specific heuristics. It currently supports:

LRX puzzle: permutation sorting on a Cayley graph generated by three moves — L (left shift), R (right shift), and X (swap of the first two elements).
Pancake puzzle: prefix-reversal pancake sorting with moves Rk, which reverse the first k elements.

This repo is a spinoff from CayleyPy by Alexander Chervov and collaborators (applying artificial intelligence methods to the mathematical problem of Cayley graph pathfinding). It represents an independent implementation of the Beam search solvers for testing, the results of which were reported in academic publications e.g. here.

Key Features

Random walk data generation algorithms for training ML models
Beam search solver BeamSolver with ML guidance and backward modes (backward_mode="off" | "bfs" | "beam")
Multiple model architectures (XGBoost, CatBoost, MLP)
GPU-optimized implementation: the search hot path, state hashing, meet-in-the-middle lookups, and model inference are designed to stay on GPU
Benchmarking, profiling, and evaluation tools

Installation

CUDA-capable GPU is required; CPU execution is not supported. uv is used for environment management.

Setup

git clone https://github.com/T0chka/MLPermutationSolver.git
cd MLPermutationSolver
uv venv
uv sync

Quick Start

To run the full pipeline end-to-end (generate random-walk training data, train an ML model, and then solve target permutations with BeamSolver using model-guided scoring) use solve_puzzle.py — configurable in the script header via InputConfig:

input.state = [1,0,2,...] — solve this specific permutation
input.state = None — generate a random permutation (use input.random_state_size and input.random_seed)

uv run solve_puzzle.py

Project Structure

MLPermutationSolver/
├── src/
│   ├── data_gen/       # Random walk data generation, BFS distances
│   ├── models/         # ML model implementations
│   └── solvers/        # Beam search solvers
├── experiments/        # Benchmarking and analysis scripts
├── pyproject.toml      # Project config and dependencies (uv)
├── uv.lock             # Locked dependency versions
└── README.md           # This file

Experiments

The experiments/ directory contains scripts for:

Benchmarking random walk implementations
Optimizing hyperparameters
Profiling model and solver performance

See experiments/README_exp.md for detailed descriptions of each script and their parameters.

Algorithms

BFS Distances (src/data_gen/bfs_distances.py)

Numba-optimized BFS over the full permutation Cayley graph for any puzzle (generators passed in). Computes exact distances from the identity to every reachable state. Memory-bound for LRX puzzle: max n=13 on RTX 3090 with 24 GB VRAM / 124 GB RAM (~70 GiB for n=13). Typical runtimes: n=12 ~2 min, n=13 ~35 min.

Random Walk Data Generation (src/data_gen/random_walks.py)

All dataset generators simulate sequences of states (permutations) starting from the identity permutation. They output (X, y), where X is a batch of states and y is a step index used as a training target / proxy for "search depth".

first_visit_random_walks Unconstrained random walks. Repeats are allowed. After sampling, each unique state is assigned y = the earliest step at which the state appeared in the whole sample ("first-visit time"). Therefore y is a function of state only, and (state, y) pairs collapse to unique states. This generator is very fast, but can waste a large fraction of the sampling budget on duplicates for larger state_size.
nbt_random_walks Per-walk self-avoiding (non-backtracking) random walks. Each walk tracks the set of states it has visited and forbids transitions to already visited states within that walk. If a walk has no valid move, it terminates early. The same state may appear at different steps across different walks, so y is not a pure function of state. This generator is significantly slower due to per-walk history checks.
random_walks_beam_nbt Beam-style sampling with global non-backtracking in a sliding history window. At each step, expand the current beam of states by all moves, discard states present in the recent global hash history, then randomly select a new beam of size n_walks from the remaining candidates. The target uses an "effective step" (counter of successful expansions); when the state-space saturates, effective step may stop increasing even if the loop continues. This variant typically maximizes unique-state coverage per fixed sampling budget and is much faster than per-walk self-avoidance.

ML Models (src/models/)

The solver treats a permutation state as a feature vector (the raw sequence of integers) and learns a scalar score y used as a proxy for "how deep / hard" a state is in the search graph.

All models implement BaseModel with four methods:

train(X, y): fit on a batch of states X and scalar targets y
predict(X): return a 1D tensor of scores for states X
save(path), load(path): persist and restore a trained model

XGBoostModel (src/models/xgboost_model.py) Gradient-boosted trees trained and inferred fully on GPU. PyTorch CUDA tensors are converted to CuPy via DLPack, XGBoost uses a GPU DMatrix for training, and inference uses inplace_predict(). Predictions are returned as a CUDA torch.Tensor via DLPack zero-copy.
CatBoostModel (src/models/catboost_model.py) CatBoostRegressor configured for GPU training. However, this wrapper currently moves tensors to CPU (X.cpu().numpy(), y.cpu().numpy()) for both training and prediction, then wraps predictions back into a torch tensor.
MLPModel (src/models/mlp_model.py) A feed-forward neural network (Linear + BatchNorm + ReLU + Dropout blocks) trained with Adam and MSE loss. This is a straightforward regression baseline that runs on the same device as X. The checkpoint includes both model and optimizer state, plus the architecture/training hyperparameters needed to reconstruct the network on load.

Beam Search Solvers and Exact Solver (src/solvers/)

Unified approximate solver BeamSolver
- Forward beam search guided by ML scores
- backward_mode="off" — pure forward beam
- backward_mode="bfs" — tensor-only backward BFS archive on GPU, then forward beam with meet lookups in the frozen archive and path improvement when a better meeting path is found
- backward_mode="beam" — full bidirectional beam with live backward frontier and meet-in-the-middle path improvement on both sides
- Solution objective (SolutionObjective) tracks the best known solution and is designed to support future objectives beyond shortest path
Pruning and bounds
- Lower-bound heuristic from the adapter combined with a dynamic path-length limit (max_steps and current best solution length) aggressively prunes branches that provably cannot beat the current best solution
Hash-based infrastructure
- HashHistory: sliding-window hash history on GPU for global non-backtracking
- MeetArchive + MeetLookup: cached, sorted hash index for fast meet-in-the-middle lookups
Path reconstruction
- PathAssembler: pure-tensor path reconstruction from parent/move archives
Search scheduling
- SearchScheduler / IndependentFrontierScheduler layer that chooses which frontier (forward / backward) to expand, ready for future NBS / pair-based schedulers
Exact solver (pancake only)
- PancakeExactSolver with pluggable incumbent solver selected via solver registry

All hot-path pieces (expansion kernel, hash history, meet index, path assembly, lower bounds, beam pruning) are implemented as GPU tensor operations.

Profiling and Solver Instrumentation

Solver runtime profiling (experiments/profile_solver.py)
- Coarse metrics: total search time, CUDA memory, states explored, solution length
- Stage-level timings for key solver stages: expand, unique, history_filter, lower_bound, beam_prune, meet_lookup, bfs_prebuild
- Per-stage counters: number of calls and total input states processed
_SolverProfiler helper inside BeamSolver
- GPU synchronization only around total search_time (for accurate wall-clock timings), with optional stage profiling guarded by a runtime flag
- Clean context-manager API (with profiler.section("expand", ...)) so that solve() and BFS prebuild code do not contain manual timer bookkeeping

TODO / Ideas

Additional solution objectives: Generalize SolutionObjective beyond objective="shortest" to support alternative criteria such as preferring longer or lexicographically-structured solutions, while keeping the same API (solution_better_than, update_best).
Faster LRX lower bound computation: The current admissible lower bound for LRX uses a max of two terms: an inversion-based bound and a cyclic displacement bound. The inversion term is still computed in O(n^2) time via temporary comparison tensors. Replace it with a more efficient exact implementation (e.g. merge-sort or Fenwick-tree based) without changing the resulting lower-bound values.
Hybrid ML + heuristic distance estimates: unify model-based scores with analytic lower bounds (gap / inversions) into a single calibrated estimate of distance-to-go, to improve ranking quality and pruning.
Diversity buckets for tie-break: When pruning the beam, many states can have the same score (tie). Currently ties are broken by lookahead (productive chain) and/or random. An idea: use diversity buckets to spread selection among states with the same score, so the beam keeps more varied states instead of clustering.

Research

This work builds upon and contributes to research in AI-based approaches to Cayley graph pathfinding. Results using these methods have been reported in:

Chervov, A., et al. (2025). "CayleyPy RL: Pathfinding and Reinforcement Learning on Cayley Graphs." arXiv:2502.18663

License

This project is licensed under the MIT License. See the LICENSE file for details.

Citation

If you use this work in your research, please cite:

@software{mlpermutationsolver,
  title={MLPermutationSolver: Machine Learning Approach to Permutation Sorting},
  author={Dolgorukova, Antonina},
  year={2024},
  url={https://github.com/T0chka/MLPermutationSolver}
}

@article{chervov2025cayleypy,
  title={CayleyPy RL: Pathfinding and Reinforcement Learning on Cayley Graphs},
  author={Chervov, A. and Soibelman, A. and Lytkin, S. and others},
  journal={arXiv preprint arXiv:2502.18663},
  year={2025}
}

Contact

For questions and collaboration:

Email: an.dolgorukova@gmail.com
GitHub: @T0chka

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLPermutationSolver

Key Features

Installation

Setup

Quick Start

Project Structure

Experiments

Algorithms

BFS Distances (src/data_gen/bfs_distances.py)

Random Walk Data Generation (src/data_gen/random_walks.py)

ML Models (src/models/)

Beam Search Solvers and Exact Solver (src/solvers/)

Profiling and Solver Instrumentation

TODO / Ideas

Research

License

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
experiments		experiments
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
solve_puzzle.py		solve_puzzle.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

MLPermutationSolver

Key Features

Installation

Setup

Quick Start

Project Structure

Experiments

Algorithms

BFS Distances (src/data_gen/bfs_distances.py)

Random Walk Data Generation (src/data_gen/random_walks.py)

ML Models (src/models/)

Beam Search Solvers and Exact Solver (src/solvers/)

Profiling and Solver Instrumentation

TODO / Ideas

Research

License

Citation

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages