diff --git a/README.md b/README.md index 86e7e05..677a2a0 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,6 @@ # Project Website -This repository jekyll-ized the source code for the [Nerfies website](https://nerfies.github.io). -You only need to change the content of [index.md](/index.md). -It's possible to only write in markdown, but you can also use HTML to achieve fancier effects. +This ist the static website for github pages. This webpage is served at [https://intuitive-robots.github.io/mops/](https://intuitive-robots.github.io/mops/) -## Test it locally - -Install [Jekyll](https://jekyllrb.com/docs/installation/), and run -``` -jekyll serve -``` -in this directory. -Then you can see the website at `http://127.0.0.1:4000`. - -## Activate Github Pages: -Go into repository settings, Github pages and serve. diff --git a/index.html b/index.html index 66b8ea3..97ea468 100644 --- a/index.html +++ b/index.html @@ -5,7 +5,7 @@ - MOPS: Multi-Object Photoreal Simulation Dataset for Computer Vision in Robot Manipulation + MOPS: Multi-Objective Photoreal Simulation Dataset for Computer Vision in Robot Manipulation - + + + @@ -66,7 +68,7 @@

- MOPS: Multi-Object Photoreal Simulation Dataset
+ MOPS: Multi-Objective Photoreal Simulation Dataset
for Computer Vision in Robot Manipulation

@@ -92,12 +94,19 @@

- Code + + + + Dataset + +

@@ -118,18 +127,23 @@

Abstract

- Datasets bridging computer vision and robotics by providing high-quality visual - annotations in manipulation-relevant scenes remain limited. - This work introduces the Multi-Object Photoreal Simulation (MOPS) - dataset, which provides comprehensive ground truth annotations for photorealistic - simulated environments. MOPS employs a zero-shot asset augmentation pipeline based on - Large Language Models (LLM) to automatically normalize 3D object scale and generate - part-level affordances. The dataset features pixel-level segmentations for tasks - crucial to robotic perception, including fine-grained part segmentation and affordance - prediction (e.g., “graspable” or “pushable”). - By combining detailed annotations with photorealistic simulation, MOPS generates a - vast, diverse collection of scenes to accelerate progress in robot perception and - manipulation. We validate MOPS through vision and robot learning benchmarks. + Datasets providing high-quality visual annotations in manipulation-relevant scenes + remain scarce. We introduce MOPS, a dataset generation framework that + combines 3D assets from PartNet-Mobility and RoboCasa with a zero-shot LLM-based + augmentation pipeline to automatically normalize object scale and generate part-level + affordance annotations, describing how an object part can be manipulated (e.g., a mug + handle is “graspable,” a drawer is + “pullable”). + Built on ManiSkill3, MOPS produces photorealistic indoor scenes with pixel-perfect + ground truth for class, part, and instance segmentation, multi-label affordances, depth, + surface normals, and 6D poses, spanning 54 affordance types across 137 object + categories. Human verification confirms 97.3% accuracy of the zero-shot + affordance labels. We validate MOPS on three vision benchmarks of increasing scene + complexity and show that ground-truth affordance masks improve imitation learning + success rates on 24 RoboCasa manipulation tasks by 7.9 percentage + points + over RGB-only baselines, with predicted affordances still yielding measurable gains. + The dataset and framework are publicly available.

@@ -201,7 +215,8 @@

Photorealistic Simulation

๐Ÿค–

LLM-Powered Annotation

Zero-shot asset augmentation using large language models for - automatic part-level labeling, scale normalization, and semantic understanding.

+ automatic part-level labeling, scale normalization, and semantic understanding + — 97.3% accurate against human verification.

@@ -218,7 +233,7 @@

Multi-Modal Ground Truth

๐Ÿ 

Diverse Environments

Kitchen environments, cluttered tabletops, and isolated object - scenarios spanning 137 object categories and 56 affordance labels.

+ scenarios spanning 137 object categories and 54 affordance labels.

@@ -260,7 +275,7 @@

Results

MOPS (Total) - 56 + 54 137 3,353 @@ -332,43 +347,31 @@

Getting Started

Alpha
- Early release — API may change. Code is split across two - repositories: + Early release — actively developed, API may change. Code and + dataset are now publicly available:
-
-

- Prerequisites: Python 3.10  ·  - CUDA-compatible GPU  ·  16 GB+ RAM -

-
conda create -n mops python=3.10
-conda activate mops
-
-pip install mani_skill
-git clone https://github.com/LiXiling/mops-data
-cd mops-data
-pip install -e .
-

- - ๐Ÿ“– Full Installation Guide → - -

-
+

+ + ๐Ÿ“– Setup & Installation Guide → + +

diff --git a/index.md b/index.md deleted file mode 100644 index 3c8570f..0000000 --- a/index.md +++ /dev/null @@ -1,348 +0,0 @@ ---- -layout: project_page -permalink: / - -title: "MOPS: Multi-Object Photoreal Simulation Dataset for Computer Vision in Robot Manipulation" -authors: - Maximilian X. Li, Paul Mattes, Nils Blank, Rudolf Lioutikov -affiliations: - Intuitive Robots Lab, Karlsruhe Institute of Technology, Germany -paper: ./static/Li2026_MOPS.pdf -code: https://github.com/LiXiling/mops-data -#video: https://www.youtube.com/results?search_query=turing+machine -#data: https://huggingface.co/docs/datasets ---- - - -
-
-

Abstract

-
- Datasets bridging computer vision and robotics by providing high-quality visual annotations - in manipulation-relevant scenes remain limited. - This work introduces the Multi-Object Photoreal Simulation (MOPS) dataset, which provides - comprehensive ground truth annotations for photorealistic simulated environments. MOPS employs - a zero-shot asset augmentation pipeline based on Large Language Models (LLM) to automatically - normalize 3D object scale and generate part-level affordances. The dataset features pixel-level - segmentations for tasks crucial to robotic perception, including fine-grained part segmentation - and affordance prediction (e.g., "graspable" or "pushable"). By combining detailed - annotations with photorealistic simulation, MOPS generates a vast, diverse collection of scenes - to accelerate progress in robot perception and manipulation. We validate MOPS through vision and - robot learning benchmarks. -
-
-
- -
- - -
-
-
- Alpha -
-
- Early Alpha Release โ€” MOPS is under active development. The public API may change and some features are still in progress. -
-
- -
- -
- - -
-

Annotation Modalities

-

MOPS provides rich, multi-modal ground truth for every scene

-
- - - -
- - -
-

Key Features

-
- -
-
-
-
๐ŸŽจ
-

Photorealistic Simulation

-

High-quality visual rendering via ManiSkill3 and SAPIEN, optimized for computer vision tasks in robotic manipulation.

-
-
-
-
-
๐Ÿค–
-

LLM-Powered Annotation

-

Zero-shot asset augmentation pipeline using large language models for automatic part-level labeling and semantic understanding.

-
-
-
-
-
๐Ÿท๏ธ
-

Pixel-Level Segmentation

-

Detailed ground truth for fine-grained part segmentation and affordance prediction (e.g., graspable, pushable).

-
-
-
-
-
๐Ÿ 
-

Diverse Environments

-

Rich indoor scenes including kitchen environments, cluttered tabletops, and isolated object scenarios at scale.

-
-
-
- -
- - -
-

Technical Overview

-
- -
-
-
- Asset Pipeline -

Normalized asset management across multiple 3D libraries with automatic part-level annotation and semantic scene understanding.

-
-
-
-
- Multi-Modal Ground Truth -

Comprehensive annotations including RGB, depth, surface normals, segmentation masks, affordance maps, and 6D pose information.

-
-
-
-
- Simulation Framework -

Built on ManiSkill3 and SAPIEN for physics-accurate simulation with photorealistic rendering and programmable scene generation.

-
-
-
- -
- - -
-

Dataset Comparison

-

MOPS provides significantly broader taxonomic coverage than existing datasets

-
- -
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
DatasetLevelAff. LabelsObj. Cat.Objects
RGB-D PartPart717105
3D-AffNetPart162322,949
MOPS-PartnetPart24462,345
MOPS-RobocasaObject441011,008
MOPS (Total)Mixed561373,353
-

While 3D-AffNet has more instances, MOPS provides significantly higher taxonomic coverage across object categories and affordance types.

-
-
-
- -
- - -
-

Robot Manipulation Results

-

Imitation learning on 24 RoboCasa tasks, evaluated over 10 environment seeds each

-
- -
-
-
-
-
21.25%
-
Success Rate
-
RGB + MOPS Affordances
-
-
-
+7.92pp
-
Absolute Gain
-
over RGB-only baseline
-
-
-
-
-
- - - - - - - - - - - - - - - - - - - - -
Policy InputsSuccess RateGain
RGB only13.33%
RGB + MOPS Affordances21.25%+7.92
-

MOPS affordance annotations provide a consistent boost to imitation learning performance across 24 RoboCasa manipulation tasks.

-
-
-
- -
- - -
-

Getting Started

-
- -
-
-
-

Prerequisites: Python 3.10  ยท  CUDA-compatible GPU  ยท  16 GB+ RAM

-
conda create -n mops python=3.10
-conda activate mops
-
-pip install mani_skill
-git clone https://github.com/LiXiling/mops-data
-cd mops-data
-pip install -e .
-

๐Ÿ“– Full Installation Guide โ†’

-
-
-
- -
- - -
-

Citation

-

If you use MOPS in your research, please cite our work

-
- -
-
-
-
@article{li2026mops,
-  title   = {Multi-Objective Photoreal Simulation (MOPS) Dataset
-             for Computer Vision in Robot Manipulation},
-  author  = {Maximilian Xiling Li and Paul Mattes and
-             Nils Blank and Rudolf Lioutikov},
-  year    = {2026}
-}
-
-
-
- -
- - -
-

This work is supported by the Intuitive Robots Lab at Karlsruhe Institute of Technology, Germany.

-
\ No newline at end of file diff --git a/static/Li2026_MOPS.pdf b/static/Li2026_MOPS.pdf index 7374678..3af8ff6 100644 Binary files a/static/Li2026_MOPS.pdf and b/static/Li2026_MOPS.pdf differ