⚠️ This is an alpha release. The code is under active development and may contain bugs.
VOPP is a massively parallel online POMDP solver. It represents planning as a sequence of tensor operations and computes policies from tens of thousands of parallel simulations on modern GPUs, without requiring synchronization between concurrent simulations.
This repository contains the official implementation of the paper:
Vectorized Online POMDP Planning
Authors: Marcus Hoerger, Muhammad Sudrajat, and Hanna Kurniawati
If you use this repository in your research, please cite the paper as follows:
@inproceedings{hoerger2026VOPP,
title={Vectorized Online POMDP Planning},
author={Marcus Hoerger and Muhammad Sudrajat and Hanna Kurniawati},
booktitle={2026 IEEE International Conference on Robotics and Automation (ICRA)},
year={2026},
url={https://arxiv.org/abs/2510.27191}
}
This project requires:
- A CUDA-capable NVIDIA GPU with NVIDIA drivers and the CUDA Toolkit installed (see the official CUDA Installation Guide for Linux)
- Python >= 3.11
- PyTorch >= 2.7.1
- Matplotlib >= 3.10
- NumPy >= 2.3
- PyYAML >= 6.0
- SciPy >= 1.16
First, prepare a Conda environment and activate it via
conda create --name <MY_ENV_NAME> python=3.11
conda activate <MY_ENV_NAME>Next, clone this repository and install its requirements via
git clone git@github.com/RDLLab/VOPP.git <VOPP_DIR>
pip install -r <VOPP_DIR>/requirements.txtwhere <VOPP_DIR> is a directory of your choice.
To run VOPP on the provided benchmark problems, activate your Conda environment and run
python <VOPP_DIR>/run_vopp.py --config <VOPP_DIR>/configs/<PROBLEM_CONFIGURATION>.yamlFor instance, to solve the Multi-Agent RockSample problem with VOPP, use
python <VOPP_DIR>/run_vopp.py --config <VOPP_DIR>/configs/ma_rocksample.yamlNew POMDP problems can be added by implementing a generative model class that inherits from pomdp_problems/base/generative_model.py and provides the following methods detailed in Generative Model Implementation Details:
import torch
from typing import Optional
from pomdp_problems.base.generative_model import GenerativeModel
class MyPOMDPProblem(GenerativeModel):
def __init__(self, **kwargs):
super().__init__(**kwargs)
...
@property
def num_actions(self) -> int:
...
def sample_initial_belief(self, num_samples: int = 1) -> torch.Tensor:
...
def sample(
self,
state: torch.Tensor,
action: torch.Tensor,
**kwargs,
) -> dict:
...
def likelihood(
self,
observation: torch.Tensor,
prev_state: torch.Tensor,
next_state: torch.Tensor,
action: torch.Tensor,
log_likelihood: bool = False,
) -> torch.Tensor:
...
def heuristics(
self,
state: torch.Tensor,
action: torch.Tensor,
current_nodes: Optional[torch.Tensor] = None,
) -> torch.Tensor:
...After implementing the model, place the Python file in a new directory under pomdp_problems/, for example:
pomdp_problems/
└── my_pomdp_problem/
├── my_pomdp_problem.py
└── __init__.py
The __init__.py file should register the new model:
from .my_pomdp_problem import MyPOMDPProblem
REGISTERED_MODELS = {
"my_pomdp_problem": MyPOMDPProblem,
}The problem can then be referenced from a configuration file using the registered model name, here "my_pomdp_problem".
A problem configuration is specified using a YAML file, where various planner and problem-specific parameters can be configured. An example can be found in ./configs/unc_navigation.yaml.
The field pomdp_model must be set to the generative model name registered in the problem's REGISTERED_MODELS dictionary. For example, if the model was registered as:
REGISTERED_MODELS = {
"my_pomdp_problem": MyPOMDPProblem,
}then the config file should contain:
pomdp_model: my_pomdp_problemAdditional problem-specific parameters can be specified directly in the YAML file:
my_parameter_a: 0.5
my_parameter_b: TrueThese parameters can then be accessed inside the generative model, e.g.,
def __init__(self, **kwargs):
super().__init__(**kwargs)
args_cli = kwargs.get('args_cli')
self.my_parameter_a = args_cli.my_parameter_a
self.my_parameter_b = args_cli.my_parameter_bOnce the generative model and configuration file have been implemented, the problem can be solved using:
python <VOPP_DIR>/run_vopp.py --config <PATH_TO_CONFIG>.yamlOur implementation of VOPP assumes discrete actions and discrete observations. Actions and observations are represented by integer IDs.
Actions passed to the generative model always have shape:
action.shape == (B,)where each entry is an integer action ID in:
{0, ..., num_actions - 1}The semantics of each action ID are defined by the generative model.
For example, in the unc_navigation problem (pomdp_problems/unc_navigation/unc_navigation.py), action IDs correspond to motion commands:
0 -> NORTH
1 -> NORTH-EAST
2 -> EAST
...
Another example is the MultiAgent-RockSample problem (pomdp_problems/ma_rocksample/ma_rocksample.py), where each action ID represents a joint action for both rovers.
Similarly, observations returned by sample() must have shape:
observation.shape == (B,)where each entry is an integer observation ID. The semantics of each observation ID are defined by the generative model.
For example, in the MultiAgent-RockSample problem (pomdp_problems/ma_rocksample/ma_rocksample.py), observations consist of one observation per rover. These joint observations are encoded into a single observation ID before being returned by sample().
VOPP only operates on batched action IDs and observation IDs. Any problem-specific action or observation encoding and decoding must be handled inside the generative model.
The num_actions property must return the number of discrete actions in the problem.
The sample_initial_belief method should return num_samples particles sampled from the initial belief distribution:
belief_particles.shape == (num_samples, state_dim)The sample method defines the generative model. It receives a batch of states and actions:
state.shape == (B, state_dim)
action.shape == (B,)and should return a dictionary containing the sampled next states, observations, rewards, and terminal flags:
{
"next_state": next_state, # (B, state_dim)
"observation": observation, # (B,)
"reward": reward, # (B,)
"terminal": terminal, # (B,)
}The likelihood method is used during belief updates and computes the observation likelihood for a batch of candidate transitions. It receives:
observation.shape == (B,)
prev_state.shape == (B, state_dim)
next_state.shape == (B, state_dim)
action.shape == (B,)and should return the likelihood of the observation for each candidate transition:
likelihood.shape == (B,)If log_likelihood=False, the method should return likelihood values. If log_likelihood=True, it should return log-likelihood values instead.
The heuristics method provides a heuristic value estimate used by the planner. It receives:
state.shape == (B, state_dim)
action.shape == (B,)and should return:
heuristic_values.shape == (B,)For questions, issues, or suggestions, feel free to get in touch:
Marcus Hoerger
marcus.hoerger@anu.edu.au