Active Semantic Perception (Paper)
Authors: Huayi Tang, Pratik Chaudhari
We develop an approach for active semantic perception, which refers to using the semantics of the scene for tasks such as exploration. We build a compact, multi-layer scene graph that can represent large, complex indoor environments at various levels of abstraction, e.g., nodes corresponding to rooms, objects, walls, windows etc., as well as fine-grained details of their geometry. We develop a procedure based on large language models (LLMs) to sample new plausible scene graphs of unobserved regions that are consistent with partial observations of the scene. We develop a procedure to compute the information gain of a potential waypoint upon this scene graph to enable sophisticated spatial reasoning: for example, of the two doors that lead out of the living room, one probably leads to the kitchen and the other to the bedroom. We evaluate our approach in realistic 3D indoor apartments in simulation and also on a Unitree Go 2 robot in the real world. Qualitative and quantitative analysis shows that our approach can pin down high-level and low-level semantic information in the environment quickly and more accurately than existing approaches.
git clone --branch v0.8 --depth 1 https://github.com/stevenlovegrove/Pangolin.git
# Install using CMake
cd Pangolin
mkdir build && cd build
cmake ..
make -j
sudo make install
mkdir -p ~/catkin_ws/src
cd ~/catkin_ws
# Upgrade CMake version if it is too old (required by nvblox)
pip install cmake==3.27.9
catkin init
catkin config -DCMAKE_BUILD_TYPE=Release -DSEMANTIC_INFERENCE_USE_TRT=OFF
catkin config --skiplist khronos_eval
cd src
git clone --recursive git@github.com:grasp-lyrl/active_semantic_perception.git
vcs import ./active_semantic_perception/mapping < active_semantic_perception/install/active_semantic_perception.rosinstall
rosdep install --from-paths . --ignore-src -r -y
cd ..
catkin build
We recommend using Python 3.9 and a virtual environment for isolation.
# Setup VirtualEnv
python3 -m virtualenv --system-site-packages -p /usr/bin/python3 ~/environments/semantic_perception
source ~/environments/semantic_perception/bin/activate
pip install ~/catkin_ws/src/active_semantic_perception/mapping/semantic_inference/semantic_inference[openset]
pip install --no-build-isolation -r ~/catkin_ws/src/active_semantic_perception/mapping/scene_segment_ros/src/requirements.txt
# Setup Habitat-Sim
pip install -e ~/catkin_ws/src/active_semantic_perception/mapping/spark_dsg
WITH_BULLET=1 WITH_CUDA=1 HEADLESS=0 CMAKE_ARGS="-DCMAKE_POLICY_VERSION_MINIMUM=3.5" pip install 'git+https://github.com/facebookresearch/habitat-sim.git@v0.3.3' -v
# Download Pretrained Weights for Wall Segmentation
wget https://github.com/hujiecpp/YOSO/releases/download/v0.1/yoso_res50_coco.pth -O ~/catkin_ws/src/active_semantic_perception/mapping/scene_segment_ros/include/yoso_res50_coco.pth
git clone --recursive https://github.com/orocos/orocos_kinematics_dynamics
cd orocos_kinematics_dynamics
git checkout 1.5.2
cd orocos_kdl && mkdir build && cd build
cmake .. && make && sudo make install
cd ../../python_orocos_kdl
cmake . -DPYTHON_EXECUTABLE=$(which python3) \
-DPYTHON_INSTALL_DIR=$(python3 -c "import site; print(site.getsitepackages()[0])")
make && sudo make install
Before running the pipeline, complete the following steps:
- Update the paths for object_tasks, place_tasks, and the scene number in realsense.launch
- Change the path in pipeline_config.yaml
- Set your own GEMINI_API_KEY
# Write Gemini API Key into the environment echo 'export GOOGLE_API_KEY="your_actual_api_key_here"' >> ~/.bashrc && source ~/.bashrc
Then launch the pipeline:
# Terminal 1 — start mapping pipeline
roslaunch clio_ros realsense.launch
# Terminal 2 — start exploration
cd src/active_semantic_perception/exploration/scripts
python exploration_pipeline.py
If you experience high memory usage when running the pipeline, try the following:
- Set occlusion check to 'false' in clio.launch
The mapping part of our pipeline builds upon Clio and vS-Graphs, whose excellent work helped us implement our approach quickly.
@misc{tang2025activesemanticperception,
title={Active Semantic Perception},
author={Huayi Tang and Pratik Chaudhari},
year={2025},
eprint={2510.05430},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2510.05430},
}
If you have any question, feel free to email huayit@seas.upenn.edu.
