title	SuperMap
subtitle	A Spatio-Temporal SLAM System for Visual-Language Navigation
layout	page
show_sidebar	false
hide_footer	false
hero_height	is-large
hero_image	/img/place_holder_01.png

Content

Abstract
Contributions
System Architecture
Point Cloud Demo
Results
Citation

SuperMap: A Spatio-Temporal SLAM System
for Visual-Language Navigation

    <div class="is-size-5 publication-authors" style="margin-top: 1.5rem;">
      <span class="author-block">Carnegie Mellon University — AirLab</span>
    </div>

    <div class="publication-links">
      <span class="link-block">
        <a href="https://github.com/gfchen01/semantic_mapping" class="external-link button is-normal is-rounded is-dark" target="_blank">
          <span class="icon"><i class="fab fa-github"></i></span>
          <span>Code</span>
        </a>
      </span>
      <span class="link-block">
        <a href="#" class="external-link button is-normal is-rounded is-dark">
          <span class="icon"><i class="fas fa-file-pdf"></i></span>
          <span>Paper</span>
        </a>
      </span>
      <span class="link-block">
        <a href="#bibtex" class="external-link button is-normal is-rounded is-dark">
          <span class="icon"><i class="fas fa-quote-left"></i></span>
          <span>Citation</span>
        </a>
      </span>
    </div>
  </div>
</div>

Abstract

SuperMap

Robotic navigation in human environments requires a spatio-temporal semantic representation that can reconcile open-vocabulary perception with long-term environmental changes. While foundation models provide strong zero-shot recognition, their predictions are intermittent and view-dependent, and naively integrating them into mapping pipelines leads to identity drift and stale semantics over time.

We present SuperMap, a 4D spatio-temporal mapping framework for language-guided navigation that integrates high-frequency geometric SLAM with asynchronous open-vocabulary perception. Our core contribution is a consistency-driven mapping engine that combines 3D-aware instance association and re-activation with a principled existence-and-label confidence update to maintain stable object identities and prune outdated map content under occlusions and scene changes.

SuperMap produces a queryable 4D scene-graph representation that interfaces naturally with Vision-Language Models by supporting compositional queries over object semantics, relations, and history. We demonstrate SuperMap on benchmarks and real robots, including dynamic scenes with appearance/disappearance and relocation, and provide ablations and runtime analysis. We will release the full system as open-source to provide the community with a deployable baseline for open-vocabulary spatio-temporal mapping.

Contributions

Open-Vocabulary Spatio-Temporal SLAM

An online robotic system that builds a persistent, queryable open-vocabulary 4D scene memory suitable for downstream language-conditioned tasks — running fully onboard in real time.

Spatio-Temporal Object Tracking

An online pipeline that integrates 2D–3D association, validation, and change-aware updates to maintain instance consistency under occlusions, partial observations, label variability, and scene change.

Instance-level Scene Graph

A 4D scene graph that incorporates spatial and temporal information for each object, equipping robots with instance-level reasoning — e.g., locating moved objects, recalling past scenes.

Open-Source Framework

Full release of the change-detection benchmark, comprehensive ablations, runtime profiling, and the real-robot visual–language navigation pipeline to facilitate reproducible research.

System Architecture

Three-Layer Pipeline

Geometric Layer — Online 3D Reconstruction

SuperOdometry provides pose estimation and a colorized dense 3D model from synchronized RGB images, depth/LiDAR, and IMU streams. Geometric priors anchor all subsequent 2D–3D association and global map consistency checks.

Instance Layer — Spatio-Temporal Instance Association

Per-frame open-vocabulary detections (GroundingDINO + SAM2) are associated to existing 3D map objects via a hybrid 2D–3D tracker. A probabilistic geometric consistency update and Bayesian semantic fusion maintain stable object identities across long time horizons under occlusions and scene change.

Topological Layer — Abstract 4D Scene Graph

The object map is abstracted into a scene graph G = (V, E_s, E_t) with spatial edges (geometric predicates: on, beside, under) and temporal edges (object trajectory history). The graph is serialized as structured text for compositional VLM queries over object semantics, spatial relations, and history.

Interactive 3D Map — CMU Campus Segment 01

Left drag: rotate | Scroll: zoom | Middle drag: pan

Loading… Reset view

White: LiDAR point cloud (47k pts) | Yellow: SuperMap 3D bounding boxes

Results

    <div class="section-card" style="margin-bottom: 2rem;">
      <div class="section-badge">Class-level Segmentation — ScanNet</div>
      <p style="margin: 1.5rem 0 1rem;">SuperMap achieves competitive accuracy against state-of-the-art object-level mapping methods while running fully online.</p>
      <table class="results-table">
        <thead><tr><th>Method</th><th>Approach</th><th>mIoU (%)</th><th>f-mIoU (%)</th><th>Acc (%)</th></tr></thead>
        <tbody>
          <tr><td>ConceptGraphs</td><td>object-level</td><td>21.62</td><td>24.32</td><td>31.05</td></tr>
          <tr><td>HOV-SG</td><td>object-level</td><td>26.79</td><td>36.05</td><td>35.17</td></tr>
          <tr class="ours"><td>SuperMap (Ours)</td><td>object-level</td><td>27.42</td><td>43.50</td><td>55.48</td></tr>
        </tbody>
      </table>
    </div>

    <div class="section-card" style="margin-bottom: 2rem;">
      <div class="section-badge">Instance-level Segmentation — ScanNet (mAP<sub>50</sub>)</div>
      <p style="margin: 1.5rem 0 1rem;">SuperMap significantly outperforms prior scene-graph methods on instance-level detection.</p>
      <table class="results-table">
        <thead><tr><th>Method</th><th>Chair</th><th>Window</th><th>Refrigerator</th><th>Sofa</th><th>Door</th></tr></thead>
        <tbody>
          <tr><td>HOV-SG</td><td>4.58</td><td>0.00</td><td>0.00</td><td>30.00</td><td>9.70</td></tr>
          <tr><td>ConceptGraphs</td><td>0.00</td><td>0.00</td><td>0.00</td><td>0.00</td><td>0.00</td></tr>
          <tr class="ours"><td>SuperMap (Ours)</td><td>63.76</td><td>42.20</td><td>62.50</td><td>33.35</td><td>10.00</td></tr>
        </tbody>
      </table>
    </div>

    <div class="section-card">
      <div class="section-badge">Spatio-Temporal Change Detection Recall</div>
      <p style="margin: 1.5rem 0 1rem;">SuperMap achieves perfect recall on appearance events and strong recall on disappearance events, significantly outperforming prior methods.</p>
      <table class="results-table">
        <thead>
          <tr><th>Method</th><th>Appeared (Bucket)</th><th>Appeared (Cart)</th><th>Appeared (Sign)</th><th>Disappeared (Plant)</th><th>Disappeared (Trash)</th><th>Disappeared (Chair)</th></tr>
        </thead>
        <tbody>
          <tr><td>Khronos</td><td>—</td><td>—</td><td>—</td><td>—</td><td>—</td><td>—</td></tr>
          <tr><td>DualMap</td><td>0.000</td><td>0.000</td><td>0.000</td><td>0.310</td><td>0.000</td><td>0.000</td></tr>
          <tr class="ours"><td>SuperMap (Ours)</td><td>1.000</td><td>0.262</td><td>0.583</td><td>0.755</td><td>0.434</td><td>1.000</td></tr>
        </tbody>
      </table>
    </div>

  </div>
</div>

Citation

@inproceedings{supermap2026,
  title     = {SuperMap: A Spatio-Temporal SLAM System for Visual-Language Navigation},
  author    = {Anonymous},
  booktitle = {Robotics: Science and Systems (RSS)},
  year      = {2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content

SuperMap: A Spatio-Temporal SLAM System
for Visual-Language Navigation

Abstract

Contributions

Open-Vocabulary Spatio-Temporal SLAM

Spatio-Temporal Object Tracking

Instance-level Scene Graph

Open-Source Framework

System Architecture

Geometric Layer — Online 3D Reconstruction

Instance Layer — Spatio-Temporal Instance Association

Topological Layer — Abstract 4D Scene Graph

Interactive 3D Map — CMU Campus Segment 01

Results

Citation

FilesExpand file tree

supermap.md

Latest commit

History

supermap.md

File metadata and controls

Content

SuperMap: A Spatio-Temporal SLAM Systemfor Visual-Language Navigation

Abstract

Contributions

Open-Vocabulary Spatio-Temporal SLAM

Spatio-Temporal Object Tracking

Instance-level Scene Graph

Open-Source Framework

System Architecture

Geometric Layer — Online 3D Reconstruction

Instance Layer — Spatio-Temporal Instance Association

Topological Layer — Abstract 4D Scene Graph

Interactive 3D Map — CMU Campus Segment 01

Results

Citation

SuperMap: A Spatio-Temporal SLAM System
for Visual-Language Navigation