[WIP] Introduce CuVS resource manager#138
Draft
narangvivek10 wants to merge 10 commits into
Draft
Conversation
Collaborator
Author
|
/ok to test f7ba116 |
Collaborator
Author
|
/ok to test 5d331d9 |
Collaborator
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

We have, until now, allowed for the indexing threads to attempt to build indexes on the GPU as and when they are ready. On a GPU with enough resources, dataset size, combined with the indexing pattern, like when a flush happens, etc., this may not seem to be a problem. In tighter conditions, however, and with relatively fewer resources and a large dataset, we may end up with the GPU resources running out, resulting in OOM situations.
I am introducing a
CuvsResourcesManagerbased approach. With a finite number ofManagedCuVSResourcesin a pool and active monitoring of the available device memory, the requesting threads are allowed to submit requests in a controlled fashion based on resource availability by acquiring resources and releasing them when finished. Once this approach is rolled out to the Index and search on the GPU API as well, theThreadLocalCuVSResourcesProviderbased approach will be retired.Summary of changes:
CuvsResourcesManagerCuvsResourcesManagerusage in the CAGRA->HNSW APIs with the priorThreadLocalCuVSResourcesProviderbased impl.cuvs-java, instead.