Skip to content

[WIP] Introduce CuVS resource manager#138

Draft
narangvivek10 wants to merge 10 commits into
NVIDIA:mainfrom
SearchScale:vivek/implement-managed-resources
Draft

[WIP] Introduce CuVS resource manager#138
narangvivek10 wants to merge 10 commits into
NVIDIA:mainfrom
SearchScale:vivek/implement-managed-resources

Conversation

@narangvivek10

@narangvivek10 narangvivek10 commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator

We have, until now, allowed for the indexing threads to attempt to build indexes on the GPU as and when they are ready. On a GPU with enough resources, dataset size, combined with the indexing pattern, like when a flush happens, etc., this may not seem to be a problem. In tighter conditions, however, and with relatively fewer resources and a large dataset, we may end up with the GPU resources running out, resulting in OOM situations.

I am introducing a CuvsResourcesManager based approach. With a finite number of ManagedCuVSResources in a pool and active monitoring of the available device memory, the requesting threads are allowed to submit requests in a controlled fashion based on resource availability by acquiring resources and releasing them when finished. Once this approach is rolled out to the Index and search on the GPU API as well, the ThreadLocalCuVSResourcesProvider based approach will be retired.

Summary of changes:

  • Introduce CuvsResourcesManager
  • Replace CuvsResourcesManager usage in the CAGRA->HNSW APIs with the prior ThreadLocalCuVSResourcesProvider based impl.
  • Cleanup: remove CuVSProvider classes as they are redundant, as we can directly use the one available in cuvs-java, instead.

@narangvivek10 narangvivek10 self-assigned this Apr 28, 2026
@narangvivek10 narangvivek10 added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Apr 28, 2026
@copy-pr-bot

copy-pr-bot Bot commented Apr 28, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@narangvivek10

Copy link
Copy Markdown
Collaborator Author

/ok to test f7ba116

@narangvivek10

Copy link
Copy Markdown
Collaborator Author

/ok to test 5d331d9

@narangvivek10 narangvivek10 marked this pull request as ready for review April 30, 2026 23:24
@narangvivek10 narangvivek10 requested a review from a team as a code owner April 30, 2026 23:24
@narangvivek10 narangvivek10 changed the title Introduce CuVS resource manager [WIP] Introduce CuVS resource manager May 11, 2026
@narangvivek10 narangvivek10 marked this pull request as draft May 11, 2026 21:34
@narangvivek10

Copy link
Copy Markdown
Collaborator Author

Below is the summary of a subset of test runs to evaluate how the peak memory estimations are performing and what we observe in reality with segment sizes ranging from 100K to 20M vectors.

Screenshot from 2026-06-01 13-15-48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant