-
Notifications
You must be signed in to change notification settings - Fork 197
Predict IVF-PQ FP16 overflow and auto-switch to FP32 #2246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
huuanhhuyn
wants to merge
12
commits into
NVIDIA:main
Choose a base branch
from
huuanhhuyn:ivfpq_fp16_overflow
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+141
−1
Open
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
19bc18f
IVF-PQ FP16 overflow detection
huuanhhuyn 01f234a
Explain the use of mapping
huuanhhuyn 08302cf
Move params fallback to params init phase
huuanhhuyn 1586a42
Choose sampling equation
huuanhhuyn 431d3ac
Use raft built-in kernels and remove manual one.
huuanhhuyn 7fcdb1d
Add kDelay param to adjust the sampling fraction growth speed
huuanhhuyn e09cb73
Reduce kSaturation to 20k (just sufficient to detect FP16 overflow in…
huuanhhuyn c4b344c
Explain uniform sampling decision
huuanhhuyn 49ee283
Avoid expensive random sampling over the full dataset.
huuanhhuyn e4ee618
Edit comments
huuanhhuyn 3ddd98a
Fix sampling number and remove defensive margin
huuanhhuyn d94fe86
Refactor for explicitness of supported distance types
huuanhhuyn File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,121 @@ | ||
| /* | ||
| * SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| */ | ||
|
|
||
| #pragma once | ||
|
|
||
| #include "../detail/ann_utils.cuh" // cuvs::spatial::knn::detail::utils::mapping | ||
|
|
||
| #include <cuvs/distance/distance.hpp> | ||
|
|
||
| #include <raft/core/device_mdarray.hpp> | ||
| #include <raft/core/error.hpp> | ||
| #include <raft/core/mdspan.hpp> | ||
| #include <raft/core/operators.hpp> | ||
| #include <raft/core/resource/cuda_stream.hpp> | ||
| #include <raft/core/resource/device_memory_resource.hpp> | ||
| #include <raft/core/resources.hpp> | ||
| #include <raft/linalg/map_reduce.cuh> | ||
| #include <raft/linalg/reduce.cuh> | ||
| #include <raft/util/cuda_dev_essentials.cuh> | ||
| #include <raft/util/cudart_utils.hpp> | ||
|
|
||
| #include <cstdint> | ||
|
|
||
| namespace cuvs::neighbors::ivf_pq::detail { | ||
|
|
||
| /** | ||
| * Estimate max_i ||x_i||^2 over the dataset. | ||
| */ | ||
| template <typename DataT, typename Accessor> | ||
| float estimate_max_squared_norm( | ||
| raft::resources const& handle, | ||
| raft::mdspan<const DataT, raft::matrix_extent<int64_t>, raft::row_major, Accessor> dataset) | ||
| { | ||
| common::nvtx::range<common::nvtx::domain::cuvs> r("estimate_max_squared_norm"); | ||
| auto stream = raft::resource::get_cuda_stream(handle); | ||
| const int64_t n_rows = dataset.extent(0); | ||
| const int64_t dim = dataset.extent(1); | ||
|
|
||
| int64_t n_sample = std::min<int64_t>(n_rows, 20000); | ||
|
|
||
| auto mr = raft::resource::get_workspace_resource_ref(handle); | ||
| auto sample = | ||
| raft::make_device_mdarray<DataT>(handle, mr, raft::make_extents<int64_t>(n_sample, dim)); | ||
| raft::copy(sample.data_handle(), | ||
| dataset.data_handle(), | ||
| n_sample * dim, | ||
| raft::resource::get_cuda_stream(handle)); | ||
|
|
||
| // Compute float-mapped squared norm | ||
| auto d_map_sq_norm = raft::make_device_vector<float, int64_t>(handle, n_sample); | ||
| raft::linalg::reduce<raft::Apply::ALONG_ROWS>( | ||
| handle, | ||
| raft::make_const_mdspan(sample.view()), | ||
| d_map_sq_norm.view(), | ||
| 0.0f, | ||
| false, | ||
| [] __device__(DataT v, auto) -> float { | ||
| float e = cuvs::spatial::knn::detail::utils::mapping<float>{}(v); | ||
| return e * e; | ||
| }, | ||
| raft::add_op(), | ||
| raft::identity_op()); | ||
| // Compute max of squared norm vector | ||
| auto d_max_sq = raft::make_device_scalar<float>(handle, 0.0f); | ||
| raft::linalg::map_reduce(handle, | ||
| raft::make_const_mdspan(d_map_sq_norm.view()), | ||
| d_max_sq.view(), | ||
| 0.0f, | ||
| raft::identity_op(), | ||
| raft::max_op()); | ||
|
|
||
| float max_sq = 0.0f; | ||
| raft::update_host(&max_sq, d_max_sq.data_handle(), 1, stream); | ||
| raft::resource::sync_stream(handle); | ||
|
|
||
| return max_sq; | ||
| } | ||
|
|
||
| } // namespace cuvs::neighbors::ivf_pq::detail | ||
|
|
||
| namespace cuvs::neighbors::ivf_pq::helpers { | ||
|
|
||
| /** | ||
| * @brief Estimate whether FP16 is likely insufficient for IVF-PQ's full-magnitude distance | ||
| * computations on this dataset (i.e. `internal_distance_dtype` and `coarse_search_dtype`). | ||
| * | ||
| * We bound the largest achievable score from the dataset's vector norms. With R = max_i ||x_i|| | ||
| * (estimated from a fraction of the dataset): | ||
| * - L2Expanded: ||x - y||^2 = ||x||^2 + ||y||^2 - 2<x,y> <= (||x|| + ||y||)^2 <= 4 * R^2 | ||
| * - InnerProduct: |<x, y>| <= ||x|| * ||y|| <= R^2 | ||
| * - CosineExpanded: data is L2-normalized, so |score| <= 1 and overflow is impossible. | ||
| */ | ||
| template <typename DataT, typename Accessor> | ||
| bool estimate_fp16_overflow( | ||
| raft::resources const& handle, | ||
| raft::mdspan<const DataT, raft::matrix_extent<int64_t>, raft::row_major, Accessor> dataset, | ||
| cuvs::distance::DistanceType metric) | ||
| { | ||
| if (dataset.extent(0) == 0) { return false; } | ||
|
|
||
| float dist_factor = 1.0f; | ||
| switch (metric) { | ||
| case cuvs::distance::DistanceType::L2Expanded: dist_factor = 4.0f; break; | ||
| case cuvs::distance::DistanceType::CosineExpanded: | ||
| // Cosine similarity scores does normalization itself, so overflow won't happen | ||
| return false; | ||
| case cuvs::distance::DistanceType::InnerProduct: dist_factor = 1.0f; break; | ||
| default: RAFT_FAIL("Unsupported distance type for IVF-PQ search %d.", int(metric)); | ||
| } | ||
|
|
||
| const float max_vector_sq_norm = | ||
| cuvs::neighbors::ivf_pq::detail::estimate_max_squared_norm(handle, dataset); | ||
| const float max_distance_sq_norm = dist_factor * max_vector_sq_norm; | ||
|
|
||
| constexpr float kFp16Max = 65504.0f; | ||
| return max_distance_sq_norm > kFp16Max; | ||
| } | ||
|
|
||
| } // namespace cuvs::neighbors::ivf_pq::helpers | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the other distance types? Would it make sense to generalize it to pass the whatever distance type is requested by the constructed index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This catches the overflow on IVF-PQ search only.
build_knn_graphby IVF_PQ only allows 3 distance types cagra_build.cuh#L1635 (L2Expanded, CosineExpanded, InnerProduct).HNSW excludes further the use of Cosine distance hnsw.hpp#L123
I have refactored the code for explicitness with a
switcharound line 104 of the same file.Tested: overflow detected with the remaining distance types - L2Expanded and InnerProduct