Skip to content

larocs/DDI-VO

Repository files navigation

DDI-VO: Deep Direct-Indirect Visual Odometry

PyTorch License: MIT

This is the official repository of Geometry-aware attention for robust visual odometry. DDI-VO is a hybrid 6-DoF Visual Odometry architecture that leverages a pure Vision Transformer (ViT) backbone to seamlessly integrate direct (photometric/global) and indirect (feature-based) tracking paradigms. By combining globally consistent representations with robust sparse tracking (SuperPoint + LightGlue), this model achieves strong generalization across diverse motion profiles including autonomous driving, aerial flight, and handheld scenarios.


Installation

You can run DDI-VO either natively using a Python virtual environment or via Docker for guaranteed reproducibility.

Option A: Docker

We provide a Dockerfile that packages all required system dependencies, CUDA drivers, and Python libraries.

  1. Clone the repository with submodules:
git clone --recursive https://github.com/larocs/DDI-VO.git
cd DDI-VO
  1. Build the Docker image:
docker build -t ddi-vo .
  1. Run the container (mounting your local dataset folder):
docker run --gpus all -it -v /path/to/your/local/datasets:/workspace/DDI-VO/datasets ddi-vo /bin/bash

Option B: Local Setup (Conda / Venv)

  1. Clone the repository with submodules:
git clone --recursive https://github.com/larocs/DDI-VO.git
cd DDI-VO

(If you already cloned without submodules, run: git submodule update --init --recursive)

  1. Install dependencies: Ensure you have PyTorch and Torchvision installed properly according to your CUDA version, then run:
pip install -r requirements.txt

Dataset Preparation

To use the default dataloaders (kitti.py, queenscamp.py, tartanair.py), your datasets must be strictly organized in the following hierarchy inside the datasets/ directory:

datasets/
├── kitti/
│   ├── sequences/
│   │   ├── 00/
│   │   │   ├── image_2/
│   │   │   └── calib.txt
│   │   └── 01/
│   └── poses/
│       ├── 00.txt
│       └── 01.txt
├── queenscamp/
│   ├── rgb_camera_info.txt
│   └── sequences/
│       ├── 01/
│       │   ├── images/
│       │   └── traj.txt
│       └── 02/
└── tartanair/
    ├── rgb_camera_info.txt
    └── abandonedfactory/
        ├── Easy/
        │   ├── P000/
        │   │   ├── image_left/
        │   │   └── pose_left.txt
        │   └── P001/
        └── Hard/

Model Weights

DDI-VO requires pre-trained weights to run, which can be obtained as follows:

chmod +x download_weights.sh
./download_weights.sh

Usage

Training

To train or fine-tune the model on the supported datasets, configure your parameters in configs/train_example.yaml and run:

python train.py checkpoints/ddi_vo_experiment \
    --conf configs/train_example.yaml \
    --use_cuda

Testing / Inference

To run inference and generate trajectory files (traj.txt) for evaluation against ground truth, run:

python test.py \
    --dataset_config configs/ddi_vo.yaml \
    --model_config configs/ddi_vo_model.yaml \
    --model_path checkpoints/ddi_vo_experiment/best_model.tar \
    --output_path results \
    --trajectory_file traj.txt

Citation

If you use this work, please cite our paper:

@article{Bruno2026,
  author  = {Bruno, Hudson and Cabral, Kleber and Givigi, Sidney and Colombini, Esther},
  title   = {Geometry-aware attention for robust visual odometry},
  journal = {SSRN Electronic Journal},
  doi     = {10.2139/ssrn.6732475},
  year = 2026,
  url     = {https://ssrn.com/abstract=6732475}
}

Other publications

Check out our deep homography estimation visual transformer, which is available here.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages