StoreGrid

Accompanying source code for the paper Modelling Customer Trajectories with Reinforcement Learning for Practical Retail Insights.

This repository contains the following main components:

StoreGrid Environment: RL gridworld environment with Gym-like interface specifically built for ease with experimentation of RL customers in retail stores. Written in PyTorch and is natively vectorized.
RL Training: Including a PyTorch implementation of the Proximal Policy Optimization (PPO) algorithm alongside training code to interface with environment. To aid with transparency during training and reproducibility, training metrics are logged to TensorBoard and (optionally) to Weights and Biases.
Trajectory and Heatmap Generation: Includes code to generate Travelling Salesman Path (TSP) and Nearest Neighbour (Probabilistic or Deterministic) trajectories. Also includes code to sample RL trajectories from a trained policy, and code to generate overlay heatmaps for sampled/generated trajectories.
Visualization Tool: A mini GUI that allows users to manually control the agent and visualize key metrics, such as the state observation and the reward received by the agent, at every step. Helpful for debugging or visualizing the StoreGrid environment during the designing phase.

Installation

This project is packaged using uv, therefore the below instructions assume uv has been installed.

uv venv --python 3.12

(Recommended, but optional) Create a virtualenv, the project was tested with Python 3.12. This command also automatically downloads Python 3.12 if it is unavailable on the system.

source .venv/bin/activate
uv pip install -e . or uv pip install -e .[all] optionally to also install dependencies required for the visualization tool.

Running Experiments

The general workflow for running/evaluating experiments is as follows:

Configure store layout in storegrid/layouts
Add/modify training configuration in storegrid/env_cfgs
Train a RL policy
Generate trajectories
Generate heatmaps

remind users to register their new layouts / env_cfg in their respectives __init__.py files

Configure store layout

A store layout determines the number of rows and columns of the store, the number/type/placements of products, and placements of shelves and checkout points. New store layouts can be placed in the storegrid/layouts folder. For reference, storegrid/layouts/2_actual_store.py contains the layout that was used in the paper.

To allow a layout to be chosen through the --env_layout flag when running the scripts/train_ppo.py script, register the layout in the storegrid/layouts/__init__.py file.

Adding a new environment configuration

An environment config determines the specifications of the interface between the StoreGrid environment and the RL agent. This includes the observations and rewards received by the RL agent. This has been deliberately isolated from the main environment file (which contains the fundamental logic of the gridworld), so that it is more convenient for researchers to iterate on. An example is changing the conditioners to study the effects of time conditioning.

To allow a layout to be chosen through the --env_cfg_id flag when running the scripts/train_ppo.py script, register the layout in the storegrid/env_cfgs/__init__.py file. For reference, storegrid/env_cfgs/1_with_time_conditioning.py contains the config for a time-conditioning environment, whereas storegrid/env_cfgs/2_wo_time_conditioning.py

Training a RL policy

Activate virtualenv source .venv/bin/activate
Run train_ppo.py, passing in desired arguments. To view all argument options and the default settings, pass in the --help flag as such: python3 scripts/train_ppo.py --help

The following commands worked well for the store layout used in the paper (i.e., 2-actual-store for the --env_layout flag):

If conditioning on time:

python3 scripts/train_ppo.py --seed 1 --env-id "StoreGrid-v0" --env_cfg_id "1-with-time-conditioning" --env_layout "2-actual-store" --ppo.total-timesteps 7864320000 --ppo.num-envs 512 --ppo.num-steps 128 --ppo.num-minibatches 4 --ppo.gae-lambda 0.99 --ppo.gamma 1 --ppo.update-epochs 5 --ppo.ent-coef 0.12 --ppo.learning-rate 1e-3 --ppo.no-anneal-lr --ppo.clip-coef 0.2 --cuda --track

The training logs for the above command can be viewed on WandB, available here.

If not conditioning on time (i.e., only conditioned on basket and checkout position):

python3 scripts/train_ppo.py --seed 1 --env-id "StoreGrid-v0" --env_cfg_id  "2-wo-time-conditioning" --ppo.total-timesteps 5898240000 --ppo.num-envs 512 --ppo.num-steps 128 --ppo.num-minibatches 4 --ppo.gae-lambda 0.99 --ppo.gamma 1 --ppo.update-epochs 5 --ppo.ent-coef 0.12 --ppo.learning-rate 1e-3 --ppo.no-anneal-lr --ppo.clip-coef 0.2 --cuda --track

The training logs for the above command can be viewed on WandB, available here.

Note that the RL policy is evaluated at multiple checkpoints (by default, 5) during training, during which trajectories are generated for different conditionals.

The evaluation metrics, alongside training logs, and saved weights (saved in the .pt format) for the RL agent will be saved to the runs/ folder.

Generating Trajectories

To generate TSP and NN trajectories, or to sample RL trajectories call the appropriately named functions in storegrid/trajectory_generation.py.

For a concrete example, refer to the following notebook: scripts/generate_trajectories.ipynb

Adapting Custom Trajectories

This codebase uses a unified data format for the trajectories which can be found in storegrid/env.py:

@dataclass
class Trajectory:
    states: TrajectoryStatesType
    actions: List[int]

In simple terms, a Trajectory contains of a list of states (of type TrajectoryStatesType) and a list of actions (of type int, defined in the IntEnum class Actions from storegrid/env.py).

TrajectoryStatesType is defined as List[Tuple[Tuple[int, int], int]], which can be deciphered as a list of tuples, where each tuple contains two elements: a grid coordinate (which is a tuple of two integers: (col_num, row_num)), and a direction (defined in DIR_TO_VEC at storegrid/env.py, assuming that (0,0) is at the top left corner while (col_num-1, row_num-1) is at the bottom right corner).

Generating Heatmaps

Once the desired trajectories are acquired (which would have the datatype of List[Trajectory]), the list of trajectories can be passed into the render_heatmap_from_trajs function located in storegrid/env.py to render a heatmap.

Similar to the above, for a concrete example of generating heatmaps, users are encouraged to refer to the following notebook: scripts/generate_trajectories.ipynb

Debugging/Visualization

Note: optional dependencies required to run this step (can be installed with: uv pip install -e .[all] )

To inspect the environment, run scripts/keyboard_control.py to render the environment, and display the exact observations/rewards received from the agent, at every step. By default, the agent is controlled manually via a keyboard. The controls for the agent is shown in the GUI.

The script is written to be highly hackable, so it can be tailored according to what is desired.

Other files that could be of interest

storegrid/env.py: Contains the underlying logic of StoreGrid primitive operations such as moving and picking up objects, how the environment is rendered and heatmaps are generated. To modify this file, one can choose to duplicate this file and make the desired changes. Then, register this modified environment as a new gym environment in storegrid/__init__.py. After that, the new environment is ready to be used by passing in the new id of the environment with the --env-id flag when executing train_ppo.py file.
storegrid/nn.py: Contains neural-network architectures for RL training. To add a new architecture, inherit from PPOBaseNetwork (located in the same file) and override from the unimplemented functions.

Credits

Parts of this codebase (i.e., rendering.py and window.py) are adapted from MiniGrid, licensed under the Apache License 2.0.
The PPO implementation (located at ppo.py) is adapted from CleanRL, which is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
images		images
scripts		scripts
storegrid		storegrid
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
supplementary_material.pdf		supplementary_material.pdf
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StoreGrid

Installation

Running Experiments

Configure store layout

Adding a new environment configuration

Training a RL policy

Generating Trajectories

Adapting Custom Trajectories

Generating Heatmaps

Debugging/Visualization

Other files that could be of interest

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StoreGrid

Installation

Running Experiments

Configure store layout

Adding a new environment configuration

Training a RL policy

Generating Trajectories

Adapting Custom Trajectories

Generating Heatmaps

Debugging/Visualization

Other files that could be of interest

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages