本项目用于对视频帧进行单目深度估计。该项目:
This project is used for Monocular Depth Estimation of video frames. It:
1. 包含视频抽帧模块,可自动将视频文件(.mp4, .avi, .mov等类型)转换为图像数据集;
Includes a video frame extraction module, automatically converting video files (.mp4, .avi, .mov etc.) into image datasets;
2. 借鉴 Depth Anything 论文思路,采用教师-学生模型架构:
Learns from the idea of Depth Anything paper, using a teacher-student model architecture:
-
教师模型: 使用 Depth Anything V2,为无标签图像生成伪深度标签,解决真实深度值难以获取的问题;
Teacher model: Uses Depth Anything V2 to generate pseudo depth labels for unlabeled images, which tackles the difficulty of obtaining real depth values;
-
学生模型: 包含模型主干和模型头两部分。模型主干支持4种类型:
ResNet18/ResNet50/MobileNetv4/DINOv2,模型头从教师模型蒸馏;Student model: Consists of backbone and head. Backbone supports 4 types:
ResNet18/ResNet50/MobileNetv4/DINOv2. Head is distilled from teacher model;
3. 支持4种损失函数:MSE/MAE/平滑L1/MiDaS风格(尺度平移不变+梯度匹配)、2种优化器:SGD/AdamW;
Supports 4 types of loss function: MSE/MAE/Smooth L1/MiDaS style (scale-and-shift-invariant + gradient matching), and 2 types of optimizer: SGD/AdamW;
4. 可集中配置数据路径、模型参数、训练超参数等;
Allows centralized configuration of data paths, model parameters, training hyperparameters, etc;
5. 支持2种模型测试模式:
Supports 2 modes for model testing:
- 训练后自动测试; Automatic test after training;
- 预训练权重评估。 Pre-trained model evaluation.
|——— Depth-Estimation
| |——— config
| | |——— config.json # 配置中心 Configuration center
| | |——— load_config.py # 加载配置文件 Load configuration file
| | |——— train_sets.txt # 训练集视频列表 Train set video list
| | |——— test_sets.txt # 测试集视频列表 Test set video list
| |——— dataset
| | |——— data_processing.py # 数据预处理 Data processing
| | |——— test_data_processing.py # 数据预处理验证 Data processing validation
| |——— model
| | |——— teacher
| | | |——— checkpoints
| | | | |——— *.pth # 教师模型权重 Teacher model weights
| | | |——— teacher_run.py # 运行教师模型 Run teacher model
| | | |——— ...
| | |——— backbones # 学生模型主干 Student model backbone
| | | |——— resnet18.py
| | | |——— resnet50.py
| | | |——— mobilenetv4.py
| | | |——— dinov2.py
| | | |——— ...
| | |——— model.py # 学生模型 Student model
| | |——— train.py # 模型训练 Train the model
| | |——— loss.py # 损失函数 Loss function
| | |——— utils.py # 辅助函数 Utility functions
| |——— input
| | |——— videos # 视频文件 Videos
| | |——— data
| | | |——— ...
| | | | |——— images # 视频帧 Video frames
| | | | |——— depth # 深度图 Depth map
| |——— output # 输出数据 Output data
| |——— debug # 数据预处理验证输出 Output for data processing validation
| |——— run.py # 运行整个项目 Run the entire project
| |——— test.py # 手动测试模型 Test the model manually
conda create -n depth-estimation python=3.10
conda activate depth-estimationpip install -r requirements.txt项目配置统一在config/config.json文件中进行。文件内可编辑的各字段含义如下,其余字段请勿修改:
Project configuration is in config/config.json. Meanings for fields ALLOWED to be edited are as below, while the rest fields SHOULD NOT be modified:
{
"seed": "随机种子 Random seed",
"data_split": {
"含义": "视频数据集划分 Dataset split",
"train": "训练集视频列表 Train set video list",
"test": "测试集视频列表 Test set video list",
"train_pairs_path": "训练集图像-深度对输出路径 Train set image-depth pairs generation path",
"test_pairs_path": "测试集图像-深度对输出路径 Test set image-depth pairs generation path"
},
"dataloader": {
"含义": "数据加载器参数 DataLoader parameters"
},
"extract_frame": {
"含义": "视频抽帧参数 Video frame extraction parameters",
"video_dir": "视频目录 Video directory",
"output_dir": "视频帧输出目录 Video frame output directory",
"output_subdir": "视频帧输出子目录 Video frame output subdirectory",
"delta": "抽帧间隔 Extraction interval"
},
"teacher_model": {
"含义": "教师模型配置 Teacher model configuration",
"enable": "是否运行教师模型 Whether to run teacher model",
"weight_dir": "模型权重目录 Model weight directory",
"name": "模型名称 Model name,支持 supports vits(默认 default)/vitb/vitl",
"output_dir": "深度图输出目录 Depth map output directory",
"pseudo_label_subdir": "深度图输出子目录 Depth map output subdirectory",
"pred_only": "是否仅生成深度图 Whether to generate depth map only",
"grayscale": "是否生成灰度图 Whether to generate grayscale image",
"pair_json_path": "图像-深度对输出路径 Image-depth pairs generation path"
},
"backbone": {
"name": "学生模型主干名称 Student model backbone name,支持 supports ResNet18(默认 default)/ResNet50/MobileNetv4/Dinov2",
},
"distillation": {
"temperature": "模型蒸馏温度 Model distillation temperature"
},
"train": {
"epochs": "训练轮数 Number of training epochs"
},
"loss_and_optimizer": {
"含义": "损失函数与优化器 Loss function and optimizer",
"loss": {
"name": "损失函数类型 Loss function type,支持 supports L1(默认 default)/L2(MSE)/smooth_L1/MiDaS",
"lambda_val": "正则项系数 Regularization coefficient"
},
"optimizer": {
"name": "优化器类型 Optimizer type,支持 supports AdamW(默认 default)/SGD",
"learning_rate": "学习率 Learning rate",
"weight_decay": "权重衰减 Weight decay",
"sgd_momentum": "SGD动量 SGD momentum"
}
},
"test": {
"含义": "模型测试参数 Model test parameters",
"automatic": "是否在训练后自动测试模型 Whether to automatically test the model after training",
"weight": "模型权重路径 Model weight path"
},
"debug": {
"含义": "数据预处理验证参数 Data processing validation parameters",
"enable": "是否启用数据预处理验证 Whether to enable data processing validation",
"sample_num": "验证样本数量 Number of samples to validate",
"output_dir": "验证结果输出目录 Validation result output directory"
},
"output": {
"含义": "输出配置 Output configuration",
"weights_dir": "模型权重输出目录 Model weight output directory",
"output_train_dir": "训练集输出数据目录 Train set output data directory",
"output_test_dir": "测试集输出数据目录 Test set output data directory",
"pred_only": "是否仅生成深度图 Whether to generate depth map only"
}
}- 此处提供B站视频下载器:唧唧Down(安装唧唧1即可)
A Bilibili video downloader is provided.
-
下载网址 Download at: http://client.jijidown.com/
-
按照软件指引下载B站视频,并将视频放置在
input/videos目录下。Follow the software's instructions to download Bilibili videos, and place them in
input/videosdirectory.
教师模型权重可在 Depth Anything V2 仓库中下载(URL见文档末尾),放置在model/teacher/checkpoints目录下。
Teacher model weights can be downloaded from Depth Anything V2 repository (URL is at the end of the document), and be placed in model/teacher/checkpoints directory.
python run.py- 论文 Paper: https://arxiv.org/pdf/1907.01341
- 仓库 Repo: https://github.com/isl-org/MiDaS
- 论文 Paper: https://arxiv.org/pdf/2401.10891
- 仓库 Repo: https://github.com/LiheYoung/Depth-Anything