Skip to content

Multi-gpu training#12

Draft
qinjian623 wants to merge 6 commits into
sel118:mainfrom
qinjian623:feature_multi_gpu_training
Draft

Multi-gpu training#12
qinjian623 wants to merge 6 commits into
sel118:mainfrom
qinjian623:feature_multi_gpu_training

Conversation

@qinjian623
Copy link
Copy Markdown

Add a new file train_culane_mp.py which is single-node , multi-gpu training script.

The original train_culane.py was deleted while developing the new one, maybe we should restore it back.

Default batch_size is still 2, but it should be too small for multi-gpu training. 64 is acceptable on a 4-V100 node.

A new var metric_skips=10 is introduced. This could low the frequency of f1 metric which using CPU only.

Start training with:

CUDA_VISIBLE_DEVICES=4,5,6,7 python train_culane_mp.py ...

to control how many GPUs involved in training.

@qinjian623
Copy link
Copy Markdown
Author

qinjian623 commented May 7, 2021

Sorry for forgetting mention this,
In the file ./datasets/culane.py line 115 to line 124,
those l[1:] or l[:] differences came from an unofficial version CULane dataset, we fix part of links in the list file.

So may need some minor changes to use official CULane.

                if self.image_set == 'test':
                    self.img_list.append(os.path.join(self.data_dir_path,
                                                      l[1:]))  # l[1:]  get rid of the first '/' so as for os.path.join
                else:
                    self.img_list.append(os.path.join(self.data_dir_path, l))

                if self.image_set == 'test':
                    self.seg_list.append(os.path.join(self.data_dir_path, 'laneseg_label_w16_test', l[1:-3] + 'png'))
                else:
                    self.seg_list.append(os.path.join(self.data_dir_path, 'laneseg_label_w16', l[:-3] + 'png'))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant