This is the official repository of Geometry-aware attention for robust visual odometry. DDI-VO is a hybrid 6-DoF Visual Odometry architecture that leverages a pure Vision Transformer (ViT) backbone to seamlessly integrate direct (photometric/global) and indirect (feature-based) tracking paradigms. By combining globally consistent representations with robust sparse tracking (SuperPoint + LightGlue), this model achieves strong generalization across diverse motion profiles including autonomous driving, aerial flight, and handheld scenarios.
You can run DDI-VO either natively using a Python virtual environment or via Docker for guaranteed reproducibility.
We provide a Dockerfile that packages all required system dependencies, CUDA drivers, and Python libraries.
- Clone the repository with submodules:
git clone --recursive https://github.com/larocs/DDI-VO.git
cd DDI-VO- Build the Docker image:
docker build -t ddi-vo .- Run the container (mounting your local dataset folder):
docker run --gpus all -it -v /path/to/your/local/datasets:/workspace/DDI-VO/datasets ddi-vo /bin/bash- Clone the repository with submodules:
git clone --recursive https://github.com/larocs/DDI-VO.git
cd DDI-VO(If you already cloned without submodules, run: git submodule update --init --recursive)
- Install dependencies: Ensure you have PyTorch and Torchvision installed properly according to your CUDA version, then run:
pip install -r requirements.txtTo use the default dataloaders (kitti.py, queenscamp.py, tartanair.py), your datasets must be strictly organized in the following hierarchy inside the datasets/ directory:
datasets/
├── kitti/
│ ├── sequences/
│ │ ├── 00/
│ │ │ ├── image_2/
│ │ │ └── calib.txt
│ │ └── 01/
│ └── poses/
│ ├── 00.txt
│ └── 01.txt
├── queenscamp/
│ ├── rgb_camera_info.txt
│ └── sequences/
│ ├── 01/
│ │ ├── images/
│ │ └── traj.txt
│ └── 02/
└── tartanair/
├── rgb_camera_info.txt
└── abandonedfactory/
├── Easy/
│ ├── P000/
│ │ ├── image_left/
│ │ └── pose_left.txt
│ └── P001/
└── Hard/
DDI-VO requires pre-trained weights to run, which can be obtained as follows:
chmod +x download_weights.sh
./download_weights.shTo train or fine-tune the model on the supported datasets, configure your parameters in configs/train_example.yaml and run:
python train.py checkpoints/ddi_vo_experiment \
--conf configs/train_example.yaml \
--use_cudaTo run inference and generate trajectory files (traj.txt) for evaluation against ground truth, run:
python test.py \
--dataset_config configs/ddi_vo.yaml \
--model_config configs/ddi_vo_model.yaml \
--model_path checkpoints/ddi_vo_experiment/best_model.tar \
--output_path results \
--trajectory_file traj.txtIf you use this work, please cite our paper:
@article{Bruno2026,
author = {Bruno, Hudson and Cabral, Kleber and Givigi, Sidney and Colombini, Esther},
title = {Geometry-aware attention for robust visual odometry},
journal = {SSRN Electronic Journal},
doi = {10.2139/ssrn.6732475},
year = 2026,
url = {https://ssrn.com/abstract=6732475}
}
Check out our deep homography estimation visual transformer, which is available here.