Skip to content

kenminglee/StoreGrid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StoreGrid

Accompanying source code for the paper Modelling Customer Trajectories with Reinforcement Learning for Practical Retail Insights.

drawing

This repository contains the following main components:

  1. StoreGrid Environment: RL gridworld environment with Gym-like interface specifically built for ease with experimentation of RL customers in retail stores. Written in PyTorch and is natively vectorized.

  2. RL Training: Including a PyTorch implementation of the Proximal Policy Optimization (PPO) algorithm alongside training code to interface with environment. To aid with transparency during training and reproducibility, training metrics are logged to TensorBoard and (optionally) to Weights and Biases.

  3. Trajectory and Heatmap Generation: Includes code to generate Travelling Salesman Path (TSP) and Nearest Neighbour (Probabilistic or Deterministic) trajectories. Also includes code to sample RL trajectories from a trained policy, and code to generate overlay heatmaps for sampled/generated trajectories.

  4. Visualization Tool: A mini GUI that allows users to manually control the agent and visualize key metrics, such as the state observation and the reward received by the agent, at every step. Helpful for debugging or visualizing the StoreGrid environment during the designing phase.

Installation

This project is packaged using uv, therefore the below instructions assume uv has been installed.

  1. uv venv --python 3.12

(Recommended, but optional) Create a virtualenv, the project was tested with Python 3.12. This command also automatically downloads Python 3.12 if it is unavailable on the system.

  1. source .venv/bin/activate

  2. uv pip install -e . or uv pip install -e .[all] optionally to also install dependencies required for the visualization tool.

Running Experiments

The general workflow for running/evaluating experiments is as follows:

  1. Configure store layout in storegrid/layouts
  2. Add/modify training configuration in storegrid/env_cfgs
  3. Train a RL policy
  4. Generate trajectories
  5. Generate heatmaps

remind users to register their new layouts / env_cfg in their respectives __init__.py files

Configure store layout

A store layout determines the number of rows and columns of the store, the number/type/placements of products, and placements of shelves and checkout points. New store layouts can be placed in the storegrid/layouts folder. For reference, storegrid/layouts/2_actual_store.py contains the layout that was used in the paper.

To allow a layout to be chosen through the --env_layout flag when running the scripts/train_ppo.py script, register the layout in the storegrid/layouts/__init__.py file.

Adding a new environment configuration

An environment config determines the specifications of the interface between the StoreGrid environment and the RL agent. This includes the observations and rewards received by the RL agent. This has been deliberately isolated from the main environment file (which contains the fundamental logic of the gridworld), so that it is more convenient for researchers to iterate on. An example is changing the conditioners to study the effects of time conditioning.

To allow a layout to be chosen through the --env_cfg_id flag when running the scripts/train_ppo.py script, register the layout in the storegrid/env_cfgs/__init__.py file. For reference, storegrid/env_cfgs/1_with_time_conditioning.py contains the config for a time-conditioning environment, whereas storegrid/env_cfgs/2_wo_time_conditioning.py

Training a RL policy

  1. Activate virtualenv source .venv/bin/activate

  2. Run train_ppo.py, passing in desired arguments. To view all argument options and the default settings, pass in the --help flag as such: python3 scripts/train_ppo.py --help

The following commands worked well for the store layout used in the paper (i.e., 2-actual-store for the --env_layout flag):

If conditioning on time:

python3 scripts/train_ppo.py --seed 1 --env-id "StoreGrid-v0" --env_cfg_id "1-with-time-conditioning" --env_layout "2-actual-store" --ppo.total-timesteps 7864320000 --ppo.num-envs 512 --ppo.num-steps 128 --ppo.num-minibatches 4 --ppo.gae-lambda 0.99 --ppo.gamma 1 --ppo.update-epochs 5 --ppo.ent-coef 0.12 --ppo.learning-rate 1e-3 --ppo.no-anneal-lr --ppo.clip-coef 0.2 --cuda --track

The training logs for the above command can be viewed on WandB, available here.

If not conditioning on time (i.e., only conditioned on basket and checkout position):

python3 scripts/train_ppo.py --seed 1 --env-id "StoreGrid-v0" --env_cfg_id  "2-wo-time-conditioning" --ppo.total-timesteps 5898240000 --ppo.num-envs 512 --ppo.num-steps 128 --ppo.num-minibatches 4 --ppo.gae-lambda 0.99 --ppo.gamma 1 --ppo.update-epochs 5 --ppo.ent-coef 0.12 --ppo.learning-rate 1e-3 --ppo.no-anneal-lr --ppo.clip-coef 0.2 --cuda --track

The training logs for the above command can be viewed on WandB, available here.

Note that the RL policy is evaluated at multiple checkpoints (by default, 5) during training, during which trajectories are generated for different conditionals.

The evaluation metrics, alongside training logs, and saved weights (saved in the .pt format) for the RL agent will be saved to the runs/ folder.

Generating Trajectories

To generate TSP and NN trajectories, or to sample RL trajectories call the appropriately named functions in storegrid/trajectory_generation.py.

For a concrete example, refer to the following notebook: scripts/generate_trajectories.ipynb

Adapting Custom Trajectories

This codebase uses a unified data format for the trajectories which can be found in storegrid/env.py:

@dataclass
class Trajectory:
    states: TrajectoryStatesType
    actions: List[int]

In simple terms, a Trajectory contains of a list of states (of type TrajectoryStatesType) and a list of actions (of type int, defined in the IntEnum class Actions from storegrid/env.py).

TrajectoryStatesType is defined as List[Tuple[Tuple[int, int], int]], which can be deciphered as a list of tuples, where each tuple contains two elements: a grid coordinate (which is a tuple of two integers: (col_num, row_num)), and a direction (defined in DIR_TO_VEC at storegrid/env.py, assuming that (0,0) is at the top left corner while (col_num-1, row_num-1) is at the bottom right corner).

Generating Heatmaps

Once the desired trajectories are acquired (which would have the datatype of List[Trajectory]), the list of trajectories can be passed into the render_heatmap_from_trajs function located in storegrid/env.py to render a heatmap.

Similar to the above, for a concrete example of generating heatmaps, users are encouraged to refer to the following notebook: scripts/generate_trajectories.ipynb

Debugging/Visualization

drawing

Note: optional dependencies required to run this step (can be installed with: uv pip install -e .[all] )

To inspect the environment, run scripts/keyboard_control.py to render the environment, and display the exact observations/rewards received from the agent, at every step. By default, the agent is controlled manually via a keyboard. The controls for the agent is shown in the GUI.

The script is written to be highly hackable, so it can be tailored according to what is desired.

Other files that could be of interest

  • storegrid/env.py: Contains the underlying logic of StoreGrid primitive operations such as moving and picking up objects, how the environment is rendered and heatmaps are generated. To modify this file, one can choose to duplicate this file and make the desired changes. Then, register this modified environment as a new gym environment in storegrid/__init__.py. After that, the new environment is ready to be used by passing in the new id of the environment with the --env-id flag when executing train_ppo.py file.
  • storegrid/nn.py: Contains neural-network architectures for RL training. To add a new architecture, inherit from PPOBaseNetwork (located in the same file) and override from the unimplemented functions.

Credits

  • Parts of this codebase (i.e., rendering.py and window.py) are adapted from MiniGrid, licensed under the Apache License 2.0.
  • The PPO implementation (located at ppo.py) is adapted from CleanRL, which is licensed under the MIT License.

About

Code for "Modelling Customer Trajectories with Reinforcement Learning for Practical Retail Insights"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors