Skip to content

Use first GPU if no free GPU is found#167

Open
SNG0407 wants to merge 2 commits into
modelscope:mainfrom
SNG0407:Feature-for-DGX-Spark-CUDA-GPU
Open

Use first GPU if no free GPU is found#167
SNG0407 wants to merge 2 commits into
modelscope:mainfrom
SNG0407:Feature-for-DGX-Spark-CUDA-GPU

Conversation

@SNG0407
Copy link
Copy Markdown

@SNG0407 SNG0407 commented May 7, 2026

Even though I'm currently using the DGX-Spark, which has its on GPU (NVIDIA GB10), I've faced the error "Error finding free GPU: invalid literal for int() with base 10: '[N/A]'".

In order to solve the above issue, I added one more if(elif) branch to check whether there is at least one GPU found. If so, the first found GPU get used instead.

Even though I'm currently using the DGX-Spark, which has its on GPU (NVIDIA GB10), I've faced the error "Error finding free GPU: invalid literal for int() with base 10: '[N/A]'".

In order to solve the above issue, I added one more if branch to check whether there is at least one GPU found. If so, the first found GPU get used instead.
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fallback mechanism to use the first available GPU if a specific free GPU cannot be identified. The review feedback highlights that the new conditional logic is redundant and results in unreachable code, suggesting a more maintainable approach by defaulting the GPU ID earlier in the process.

Comment on lines +67 to +72
elif torch.cuda.device_count() > 0:
# if the free gpu is not found, but there is at least one GPU, use the first GPU
print(f"Using GPU: {torch.cuda.get_device_name(0)}")
args.use_cuda = 1
torch.cuda.set_device(0)
self.device = torch.device('cuda:0')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition elif torch.cuda.device_count() > 0: is redundant because torch.cuda.is_available() (checked at line 60) already ensures that at least one GPU is available. This addition also makes the subsequent else block (lines 73-77) unreachable dead code. Consider simplifying the logic by defaulting free_gpu_id to 0 if get_free_gpu() returns None, which would allow removing the redundant branches and improving maintainability.

this path occurs error when it's being downloaded on window environment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant