Use first GPU if no free GPU is found#167
Conversation
Even though I'm currently using the DGX-Spark, which has its on GPU (NVIDIA GB10), I've faced the error "Error finding free GPU: invalid literal for int() with base 10: '[N/A]'". In order to solve the above issue, I added one more if branch to check whether there is at least one GPU found. If so, the first found GPU get used instead.
There was a problem hiding this comment.
Code Review
This pull request introduces a fallback mechanism to use the first available GPU if a specific free GPU cannot be identified. The review feedback highlights that the new conditional logic is redundant and results in unreachable code, suggesting a more maintainable approach by defaulting the GPU ID earlier in the process.
| elif torch.cuda.device_count() > 0: | ||
| # if the free gpu is not found, but there is at least one GPU, use the first GPU | ||
| print(f"Using GPU: {torch.cuda.get_device_name(0)}") | ||
| args.use_cuda = 1 | ||
| torch.cuda.set_device(0) | ||
| self.device = torch.device('cuda:0') |
There was a problem hiding this comment.
The condition elif torch.cuda.device_count() > 0: is redundant because torch.cuda.is_available() (checked at line 60) already ensures that at least one GPU is available. This addition also makes the subsequent else block (lines 73-77) unreachable dead code. Consider simplifying the logic by defaulting free_gpu_id to 0 if get_free_gpu() returns None, which would allow removing the redundant branches and improving maintainability.
this path occurs error when it's being downloaded on window environment.
Even though I'm currently using the DGX-Spark, which has its on GPU (NVIDIA GB10), I've faced the error "Error finding free GPU: invalid literal for int() with base 10: '[N/A]'".
In order to solve the above issue, I added one more if(elif) branch to check whether there is at least one GPU found. If so, the first found GPU get used instead.