Distill3R: A Pipeline for Democratizing 3D Foundation Models on Commodity Hardware

Brandon Leblanc · Charalambos Poullis

Immersive and Creative Technologies Lab, Concordia University

Accepted at the 23rd Conference on Robots and Vision (CRV 2026), Vancouver, Canada

Easy Navigation

Summary
Model Overview
Model Comparison
Installation
Training Pipeline
Evaluation
Inference
Citation
Acknowledgments

Summary

Distill3R is a knowledge distillation framework designed to compress large-scale 3D foundation models into compact students that are fully trainable on a single workstation. While models like Fast3R (650M parameters, trained on 128 A100 GPUs for 6 days) and VGGT (1B parameters, trained on 64 A100 GPUs for 9 days) have achieved state-of-the-art results in multi-view 3D reconstruction, their computational requirements create a significant barrier to entry for most academic laboratories.

Distill3R bridges this compute divide through two primary innovations:

Offline Teacher Caching: A pipeline that decouples heavy teacher inference from the training loop by pre-computing and compressing supervision signals (point maps, confidence maps) into an efficient cache format.
Confidence-Aware Distillation Loss: A loss function that leverages the teacher's learned uncertainty to weight geometric supervision, preventing degenerate solutions and stabilizing training on commodity hardware.

Our 72M-parameter student achieves:

9x parameter reduction compared to the 650M-parameter Fast3R teacher
5x inference speedup at 128 views
Full training in under 3 days on a single workstation with 2x RTX 6000 Ada GPUs

This work is not intended to compete with state-of-the-art foundation models, but to provide an accessible research baseline for laboratories without access to large-scale compute. Additionally, this pipeline enables practitioners to train and specialize the model on their own domain-specific data at minimal cost.

Reproducibility: This repository does not distribute a pre-trained Distill3R checkpoint. The full pipeline — teacher caching, student training, and inference — is provided so the model can be reproduced from scratch on a single workstation. Follow the Training Pipeline to produce your own checkpoint.

Model Overview

Architecture:

Encoder: DUNE ViT-Small (21M parameters) with weights shared across all N views
Fusion Decoder: Compressed transformer (6 layers, 6 heads, 384 embedding dim) with cross-view self-attention
Prediction Heads: Two lightweight DPT heads for global/local 3D coordinates and confidence maps

Training Pipeline:

Raw datasets are processed into a unified manifest
Fast3R teacher generates supervision signals (point maps + confidence) for each sample
Outputs are compressed (float16 + RLE encoding) and cached to disk
Student trains on cached supervision without requiring teacher inference

Model Comparison

Model	Parameters	Role
Fast3R	650M	Teacher (external dependency)
Distill3R (Ours)	72M	Student — train via the pipeline

Inference Efficiency

System efficiency comparison on RTX 6000 Ada. All methods except Fast3R evaluated at 378×518 resolution; Fast3R uses 384×512.

Method	N=12		N=32		N=64		N=96		N=128
	Time (s)	Mem (GB)	Time (s)	Mem (GB)	Time (s)	Mem (GB)	Time (s)	Mem (GB)	Time (s)	Mem (GB)
Fast3R (Teacher)	0.32	6.86	1.14	12.11	3.26	21.11	6.35	32.36	10.11	44.36
VGGT	0.59	15.28	2.28	33.98	6.40	38.41	OOM	OOM	OOM	OOM
Distill3R (Ours)	0.13	4.05	0.41	9.97	1.02	21.80	1.78	28.69	2.69	31.90

Installation

External Dependencies

Clone the repository with submodules:

git clone --recursive https://github.com/TheFourthKaramazov/Distill3R.git
cd Distill3R

The following external repositories are included as submodules:

external/fast3r/ - Fast3R teacher model
external/dune/ - DUNE encoder

Conda Environment

Create a conda environment with all necessary packages:

conda env create -f environment.yml
conda activate distill3r

Or manually:

conda create -n distill3r python=3.10
conda activate distill3r
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install lightning timm einops omegaconf pynvml psutil tqdm pillow opencv-python matplotlib

Pre-trained Dependencies

These are the external weights required to run the pipeline. They are dependencies of the teacher and encoder, not a released Distill3R model.

DUNE Encoder (required for training):

mkdir -p pretrained_models
wget -O pretrained_models/dune_vitsmall14_448.pth \
  "https://download.europe.naverlabs.com/dune/dune_vitsmall14_448.pth"

Fast3R Teacher (required for cache generation): Automatically downloaded via torch.hub during cache generation.

Training Pipeline

Datasets

We train on six established datasets following Fast3R:

Dataset	Type	Description
CO3D-v2	Object-centric	Real-world object videos
ScanNet++	Indoor	High-fidelity indoor scans
Habitat	Indoor	Rendered indoor navigation
MegaDepth	Outdoor	Internet photos with SfM depth
BlendedMVS	Mixed	Blended real/synthetic scenes
ARKitScenes	Indoor	Mobile RGB-D captures

Note: We will shortly provide the complete subsampled training data upon request. However, the primary goal of this pipeline is to enable others to train the model on their own domain-specific data. This is what Distill3R was built for. If training on your own data, using the empty directories and config files for the above data as a guide for your own implemntation will make the process simple.

Configure dataset paths in configs/data_paths.yaml.

Manifest Generation

Process raw images and create training manifest:

python utils/generate_manifest.py

Configuration (configs/data_paths.yaml):

sample_frac: Fraction of scenes to use (default: 0.4)
resize_max: Maximum image edge size (default: 960px)

Output:

Processed images: processed_data/images/{dataset}/*.png
Manifest CSV: processed_data/images/manifest.csv

Teacher Cache Generation

Generate supervision signals from Fast3R teacher:

python distill3r/teacher/export_fast3r.py

Environment Variables:

export TEACHER_MANIFEST_PATH="processed_data/images/manifest.csv"
export TEACHER_CACHE_DIR="caches/teacher_cache"
export TEACHER_MAX_VIEWS=20
export TEACHER_MAX_SAMPLES_PER_SCENE=5

Output (per sample):

caches/teacher_cache/{dataset}/{scene}_sample{N}/
├── consolidated.npz    # Stacked predictions for all N views
│   ├── xyz_global      [N, 224, 518, 3] float16
│   ├── xyz_local       [N, 224, 518, 3] float16
│   ├── conf_global     [N, 224, 518] float16
│   ├── conf_local      [N, 224, 518] float16
│   └── masks           [N] RLE-encoded boolean
└── sampled_views.json  # View indices for reproducibility

Resume interrupted cache generation:

python distill3r/teacher/export_fast3r.py --resume

Student Training

Train the distilled student model:

python distill3r/train.py --config configs/distill3r.yaml

Key Configuration (configs/distill3r.yaml):

model:
  encoder_type: "dune"        

training:
  num_gpus: 2
  batch_size: 4                  # Per-GPU
  gradient_accumulation_steps: 2 # Effective batch size: 16
  max_epochs: 60

loss:
  alpha_global: 2.0              # Global geometry weight
  alpha_local: 1.0               # Local geometry weight
  gamma_conf: 0.001              # Confidence supervision weight

Loss Function:

L_total = α_g · L_global + α_l · L_local + γ · L_conf

L_global: Confidence-weighted L2 on global 3D coordinates (cross-view normalized)
L_local:  Confidence-weighted L2 on local 3D coordinates (per-view normalized)
L_conf:   L1 on confidence maps

Training writes the student weights to your run's output directory; that is the checkpoint you pass to the inference script below.

Monitor Training:

tensorboard --logdir logs/

Evaluation

Evaluation code coming soon.

Inference

Run inference on a directory of images using a checkpoint you produced from the Training Pipeline to generate a colored 3D point cloud:

python utils/test_checkpoint_images.py path/to/images/ \
    --checkpoint path/to/your_trained_checkpoint.ckpt \
    --output-dir results/

Options:

--size: Image preprocessing size (default: 518)
--conf-percentile: Confidence threshold percentile (default: 10, keeps top 90% most confident points)
--device: Device to use (default: cuda)

Example:

# Run inference on sample images
python utils/test_checkpoint_images.py samples/apple/ \
    --checkpoint path/to/your_trained_checkpoint.ckpt \
    --output-dir results/apple_reconstruction \
    --conf-percentile 15

Output:

{scene_name}_student.ply: Colored point cloud (viewable in MeshLab, CloudCompare, etc.)
{scene_name}_info.txt: Inference metadata and timing

Citation

If you find this work useful, please consider citing the published version:

@inproceedings{leblanc2026distill3r,
    title     = {{Distill3R}: A Pipeline for Democratizing {3D} Foundation Models on Commodity Hardware},
    author    = {Leblanc, Brandon and Poullis, Charalambos},
    booktitle = {Proceedings of the 23rd Conference on Robots and Vision (CRV)},
    year      = {2026},
    doi       = {10.21428/d82e957c.68efb4ef},
}

Acknowledgments

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) through the CGS-M scholarship.

Built on top of:

Fast3R - Teacher model
DUNE - Pre-trained encoder
DUSt3R - Point map representation

Training datasets:

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distill3R: A Pipeline for Democratizing 3D Foundation Models on Commodity Hardware

Easy Navigation

Summary

Model Overview

Model Comparison

Inference Efficiency

Installation

External Dependencies

Conda Environment

Pre-trained Dependencies

Training Pipeline

Datasets

Manifest Generation

Teacher Cache Generation

Student Training

Evaluation

Inference

Citation

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
caches		caches
checkpoints		checkpoints
configs		configs
dataset		dataset
distill3r		distill3r
external		external
logs		logs
pretrained_models		pretrained_models
processed_data		processed_data
raw_data		raw_data
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Distill3R: A Pipeline for Democratizing 3D Foundation Models on Commodity Hardware

Easy Navigation

Summary

Model Overview

Model Comparison

Inference Efficiency

Installation

External Dependencies

Conda Environment

Pre-trained Dependencies

Training Pipeline

Datasets

Manifest Generation

Teacher Cache Generation

Student Training

Evaluation

Inference

Citation

Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages