Updating Fern and fixing a couple small links #2210
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR pins the Fern CLI version for docs builds and reorganizes documentation: "Advanced Topics" is repurposed as a "Field Guide", page slugs and navigation are updated, redirects are added, many internal links are migrated, multiple API doc signatures were reformatted (removing ChangesDocumentation Structure Reorganization and Version Pinning
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
fern/build_docs.sh (1)
65-65: ⚡ Quick winConsider validating Python3 availability.
The script now depends on Python3 to read the Fern version from config, but unlike Node.js (which has
require_node_22), there's no explicit check for Python3 availability. Whileset -euo pipefailwill cause the script to fail if Python3 is missing, the error message may not clearly indicate the requirement.🐍 Suggested improvement to add Python3 validation
Add a validation function before line 55:
+require_python3() { + if ! command -v python3 >/dev/null 2>&1; then + echo "Fern docs build requires Python 3, but python3 was not found on PATH." >&2 + echo "Install Python 3 before running fern/build_docs.sh." >&2 + exit 1 + fi +} + +require_node_22 +require_python3 + -require_node_22 - fern_config_version() {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@fern/build_docs.sh` at line 65, The script now calls fern_config_version (used where FERn_VERSION="$(fern_config_version)") which relies on Python3 but there's no explicit check; add a validation helper (similar to require_node_22) that checks for a working python3 (or python3 -V) and prints a clear error and exits if not found, invoke that helper before calling fern_config_version (before the FERn_VERSION assignment) so the script fails with a clear message when Python3 is missing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@fern/docs.yml`:
- Around line 252-254: The "ABI Stability" navigation entry uses a relative path
"./developer_guide/abi_stability.md" that is inconsistent with other entries;
update the path for the page "ABI Stability" (the YAML mapping with key page:
"ABI Stability") to match the repository's nav convention (e.g.
"./pages/developer_guide/abi_stability.md") so sidebar resolution and link
checks succeed.
---
Nitpick comments:
In `@fern/build_docs.sh`:
- Line 65: The script now calls fern_config_version (used where
FERn_VERSION="$(fern_config_version)") which relies on Python3 but there's no
explicit check; add a validation helper (similar to require_node_22) that checks
for a working python3 (or python3 -V) and prints a clear error and exits if not
found, invoke that helper before calling fern_config_version (before the
FERn_VERSION assignment) so the script fails with a clear message when Python3
is missing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: df543cce-e9d5-411f-9666-bb48832e4d92
📒 Files selected for processing (14)
fern/build_docs.shfern/docs.ymlfern/fern.config.jsonfern/pages/advanced_topics.mdfern/pages/api_guide.mdfern/pages/c_guidelines.mdfern/pages/cpp_guidelines.mdfern/pages/developer_guide.mdfern/pages/java_guidelines.mdfern/pages/jit_compilation.mdfern/pages/python_guidelines.mdfern/pages/udf_usage.mdfern/pages/user_guide.mdfern/pages/user_guide/integration_patterns.md
| - page: "ABI Stability" | ||
| path: "./developer_guide/abi_stability.md" | ||
| - page: "Link-time Optimization" |
There was a problem hiding this comment.
Fix broken navigation path for ABI Stability.
Line 253 uses ./developer_guide/abi_stability.md, but navigation paths in this file consistently resolve from ./pages/.... This likely breaks the Developer Guide sidebar entry and link checks.
Suggested fix
- section: "Advanced Topics"
contents:
- page: "ABI Stability"
- path: "./developer_guide/abi_stability.md"
+ path: "./pages/developer_guide/abi_stability.md"
- page: "Link-time Optimization"
path: "./pages/jit_lto_guide.md"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - page: "ABI Stability" | |
| path: "./developer_guide/abi_stability.md" | |
| - page: "Link-time Optimization" | |
| - page: "ABI Stability" | |
| path: "./pages/developer_guide/abi_stability.md" | |
| - page: "Link-time Optimization" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@fern/docs.yml` around lines 252 - 254, The "ABI Stability" navigation entry
uses a relative path "./developer_guide/abi_stability.md" that is inconsistent
with other entries; update the path for the page "ABI Stability" (the YAML
mapping with key page: "ABI Stability") to match the repository's nav convention
(e.g. "./pages/developer_guide/abi_stability.md") so sidebar resolution and link
checks succeed.
divyegala
left a comment
There was a problem hiding this comment.
Few small comments, pre-approving
|
|
||
| For implementation details on building JIT LTO kernel fragments and linking them at runtime, see [Link-time Optimization](jit_lto_guide.md). | ||
| - [cuvs::neighbors::cagra::build()](/api-reference/cpp-api-neighbors-cagra) when graph construction uses `graph_build_params::ivf_pq_params` or `graph_build_params::iterative_search_params` | ||
| - [cuvs::neighbors::cagra::extend()](/api-reference/cpp-api-neighbors-cagra) when adding nodes, because the extension path searches the existing CAGRA graph |
There was a problem hiding this comment.
Is the when necessary here? We should only add if needed
There was a problem hiding this comment.
We should rename this file to field_guide.md and update all references
Co-authored-by: Divye Gala <divyegala@gmail.com>
|
/ok to test 729e9d2 |
There was a problem hiding this comment.
Actionable comments posted: 12
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@fern/pages/c_api/c-api-neighbors-cagra.md`:
- Line 455: The note for the persistent_device_usage/kDeviceUsage parameter
contains the awkward phrase "alongside with the persistent kernel"; edit the
paragraph to use "alongside the persistent kernel" (or "with the persistent
kernel") instead, updating any other occurrences of "alongside with" in that
description so the sentence reads cleanly (e.g., "running any other work on GPU
alongside the persistent kernel makes the setup fragile").
In `@fern/pages/c_api/c-api-preprocessing-quantize-scalar.md`:
- Line 145: The page uses both “quantisation” and “quantization”; choose a
single variant (use “quantization”) and replace every occurrence on this page
(including the sentence "Applies quantization transform to given dataset" and
the other occurrence around line 170) so the terminology is consistent; search
for "quantis*" and "quantiz*" within
fern/pages/c_api/c-api-preprocessing-quantize-scalar.md and update all instances
to the chosen spelling.
In `@fern/pages/cpp_api/cpp-api-neighbors-brute-force.md`:
- Line 628: The parameter description for `include_dataset` currently uses
file-specific wording ("write out the dataset to the file") but these overloads
serialize to std::ostream&, so update both occurrences (the entries around the
`include_dataset` parameter in this document) to stream-neutral wording such as
"Whether or not to include the dataset in the serialized output." Preserve the
parameter type (`bool`) and default (`true`) and keep phrasing consistent with
other overload descriptions.
In `@fern/pages/cpp_api/cpp-api-neighbors-vamana.md`:
- Around line 535-536: Update the boolean parameter descriptions to use concise
phrasing: change "whether or not to serialize the dataset" to "whether to
serialize the dataset" for the include_dataset parameter and change "whether
output file should be aligned to disk sectors of 4096 bytes" to "whether the
output file should be aligned to 4096‑byte disk sectors" for sector_aligned;
apply these edits for every occurrence of include_dataset and sector_aligned in
cpp-api-neighbors-vamana.md so the wording matches the concise style used
elsewhere.
In `@fern/pages/python_api/python-api-neighbors-cagra.md`:
- Line 268: The markdown for the `metric` parameter contains an unescaped
asterisk in the cosine distance formula which triggers MD037; update the cosine
formula in the `metric` description (the line referencing "cosine distance is
defined as distance(a, b) = 1 - \\sum_i a_i * b_i / ( \|\|a\|\|_2 *
\|\|b\|\|_2)") so the multiplication asterisk is escaped or wrap the entire
formula in inline code/backticks; ensure the description text for `metric`
(mentioning "sqeuclidean", "inner_product", "cosine") uses the escaped asterisk
or code formatting to prevent markdownlint failures.
In `@fern/pages/python_api/python-api-neighbors-hnsw.md`:
- Line 92: The table row for parameter `M` contains unescaped arithmetic
expressions "m * 2" and "m * 3" which can trip markdownlint MD037; update the
text for `M` (the HNSW parameter) to wrap those expressions in code formatting
(e.g., `` `m * 2` `` and `` `m * 3` ``) or escape the asterisks so `graph_degree
= m * 2` and `intermediate_graph_degree = m * 3` are rendered as code; ensure
you update the same cell that references `graph_degree` and
`intermediate_graph_degree`.
In `@fern/pages/python_api/python-api-neighbors-ivf-flat.md`:
- Line 71: The table cell for the `metric` parameter contains unescaped
asterisks in the math phrases which triggers markdownlint MD037; update the
prose for the `metric` row (the `metric` parameter description) to either escape
each asterisk (e.g. \*) or wrap the mathematical expressions in inline
code/backticks so the multiplications and norm symbols are not parsed as
emphasis—ensure occurrences like a_i * b_i and ||a||_2 are escaped or
code-formatted consistently for `sqeuclidean`, `inner_product`, and `cosine`
formula descriptions.
In `@fern/pages/python_api/python-api-neighbors-ivf-pq.md`:
- Line 201: The markdown table cell for the `metric` parameter contains
unescaped asterisks in the cosine distance formula which breaks MD parsing;
update the cosine formula in the `metric` description (the line mentioning
"cosine distance is defined as...") to either escape the multiplication
asterisks or wrap the entire formula in inline code/backticks so the "*"
characters are not treated as emphasis (keep the rest of the text and valid
metric list unchanged).
- Line 349: Fix the typo in the `lut_dtype` parameter description: change
"dimansionality" to "dimensionality" in the sentence that reads "so fast shared
memory kernels can be used even for datasets with large dimansionality." Update
the documentation string for `lut_dtype` accordingly so it reads
"dimensionality" and keep the rest of the phrasing unchanged.
In `@fern/pages/python_api/python-api-neighbors-ivf-sq.md`:
- Line 74: The table cell describing `metric` contains raw asterisks in the
inline formulas (e.g., "a_i * b_i" and "\|\|a\|\|_2 * \|\|b\|\|_2") which
triggers MD037; update the markdown in the `metric` description to escape
multiplication operators by replacing * with \* in those expressions (for
example "a_i \* b_i" and "\|\|a\|\|_2 \* \|\|b\|\|_2") so the table renders
correctly and avoids markdown lint errors.
In `@fern/pages/python_api/python-api-neighbors-tiered-index.md`:
- Line 41: The metric description contains unescaped asterisks in the formulas
(e.g., in "inner product" and "cosine" definitions) which triggers markdown lint
MD037; update the text for the `metric` parameter to either wrap mathematical
operators/expressions like a_i * b_i and norms (||a||_2) in inline code spans or
escape the `*` operators so they are not treated as Markdown emphasis —
specifically edit the `metric` description line that mentions "inner product
distance" and "cosine distance" (and the sqeuclidean formula if needed) to use
inline code for expressions such as a_i * b_i and `||a||_2` or escape the `*`
characters.
In `@fern/pages/user_guide.md`:
- Line 11: Update the broken markdown link "[API Guide](/user-guide/api-guides)"
in fern/pages/user_guide.md to point to the correct configured page name
(singular) — replace the URL segment "/user-guide/api-guides" with
"/user-guide/api-guide" (matching the configured file name api_guide.md) so the
link resolves correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e5d5f3cf-5953-434e-8e92-3e28af7f73f0
📒 Files selected for processing (58)
fern/docs.ymlfern/pages/c_api/c-api-cluster-kmeans.mdfern/pages/c_api/c-api-core-c-api.mdfern/pages/c_api/c-api-distance-pairwise-distance.mdfern/pages/c_api/c-api-neighbors-all-neighbors.mdfern/pages/c_api/c-api-neighbors-brute-force.mdfern/pages/c_api/c-api-neighbors-cagra.mdfern/pages/c_api/c-api-neighbors-hnsw.mdfern/pages/c_api/c-api-neighbors-ivf-flat.mdfern/pages/c_api/c-api-neighbors-ivf-pq.mdfern/pages/c_api/c-api-neighbors-ivf-sq.mdfern/pages/c_api/c-api-neighbors-mg-cagra.mdfern/pages/c_api/c-api-neighbors-mg-ivf-flat.mdfern/pages/c_api/c-api-neighbors-mg-ivf-pq.mdfern/pages/c_api/c-api-neighbors-nn-descent.mdfern/pages/c_api/c-api-neighbors-refine.mdfern/pages/c_api/c-api-neighbors-tiered-index.mdfern/pages/c_api/c-api-neighbors-vamana.mdfern/pages/c_api/c-api-preprocessing-pca.mdfern/pages/c_api/c-api-preprocessing-quantize-binary.mdfern/pages/c_api/c-api-preprocessing-quantize-pq.mdfern/pages/c_api/c-api-preprocessing-quantize-scalar.mdfern/pages/cpp_api/cpp-api-cluster-agglomerative.mdfern/pages/cpp_api/cpp-api-cluster-kmeans.mdfern/pages/cpp_api/cpp-api-distance-distance.mdfern/pages/cpp_api/cpp-api-neighbors-all-neighbors.mdfern/pages/cpp_api/cpp-api-neighbors-brute-force.mdfern/pages/cpp_api/cpp-api-neighbors-cagra.mdfern/pages/cpp_api/cpp-api-neighbors-dynamic-batching.mdfern/pages/cpp_api/cpp-api-neighbors-epsilon-neighborhood.mdfern/pages/cpp_api/cpp-api-neighbors-hnsw.mdfern/pages/cpp_api/cpp-api-neighbors-ivf-pq.mdfern/pages/cpp_api/cpp-api-neighbors-ivf-sq.mdfern/pages/cpp_api/cpp-api-neighbors-nn-descent.mdfern/pages/cpp_api/cpp-api-neighbors-refine.mdfern/pages/cpp_api/cpp-api-neighbors-vamana.mdfern/pages/cpp_api/cpp-api-preprocessing-pca.mdfern/pages/cpp_api/cpp-api-preprocessing-quantize-pq.mdfern/pages/cpp_api/cpp-api-selection-select-k.mdfern/pages/cpp_api/cpp-api-stats-silhouette-score.mdfern/pages/cpp_api/cpp-api-stats-trustworthiness-score.mdfern/pages/field_guide.mdfern/pages/jit_compilation.mdfern/pages/python_api/python-api-cluster-kmeans.mdfern/pages/python_api/python-api-common.mdfern/pages/python_api/python-api-neighbors-all-neighbors.mdfern/pages/python_api/python-api-neighbors-cagra.mdfern/pages/python_api/python-api-neighbors-hnsw.mdfern/pages/python_api/python-api-neighbors-ivf-flat.mdfern/pages/python_api/python-api-neighbors-ivf-pq.mdfern/pages/python_api/python-api-neighbors-ivf-sq.mdfern/pages/python_api/python-api-neighbors-mg-cagra.mdfern/pages/python_api/python-api-neighbors-mg-ivf-flat.mdfern/pages/python_api/python-api-neighbors-mg-ivf-pq.mdfern/pages/python_api/python-api-neighbors-tiered-index.mdfern/pages/python_api/python-api-preprocessing-quantize-pq.mdfern/pages/user_guide.mdfern/scripts/generate_api_reference.py
💤 Files with no reviewable changes (2)
- fern/pages/field_guide.md
- fern/pages/jit_compilation.md
✅ Files skipped from review due to trivial changes (24)
- fern/pages/cpp_api/cpp-api-neighbors-epsilon-neighborhood.md
- fern/pages/cpp_api/cpp-api-neighbors-all-neighbors.md
- fern/pages/python_api/python-api-neighbors-mg-ivf-pq.md
- fern/pages/python_api/python-api-cluster-kmeans.md
- fern/pages/python_api/python-api-neighbors-all-neighbors.md
- fern/pages/cpp_api/cpp-api-cluster-agglomerative.md
- fern/pages/python_api/python-api-neighbors-mg-cagra.md
- fern/pages/cpp_api/cpp-api-neighbors-ivf-pq.md
- fern/pages/python_api/python-api-common.md
- fern/pages/cpp_api/cpp-api-neighbors-dynamic-batching.md
- fern/pages/cpp_api/cpp-api-preprocessing-quantize-pq.md
- fern/pages/cpp_api/cpp-api-selection-select-k.md
- fern/pages/cpp_api/cpp-api-stats-silhouette-score.md
- fern/pages/python_api/python-api-preprocessing-quantize-pq.md
- fern/pages/python_api/python-api-neighbors-mg-ivf-flat.md
- fern/pages/cpp_api/cpp-api-neighbors-refine.md
- fern/pages/cpp_api/cpp-api-stats-trustworthiness-score.md
- fern/pages/c_api/c-api-neighbors-refine.md
- fern/pages/cpp_api/cpp-api-neighbors-nn-descent.md
- fern/pages/cpp_api/cpp-api-distance-distance.md
- fern/pages/c_api/c-api-neighbors-vamana.md
- fern/pages/cpp_api/cpp-api-cluster-kmeans.md
- fern/pages/cpp_api/cpp-api-neighbors-cagra.md
- fern/pages/c_api/c-api-neighbors-mg-cagra.md
| | `persistent` | `bool` | Whether to use the persistent version of the kernel (only SINGLE_CTA is supported a.t.m.) | | ||
| | `persistent_lifetime` | `float` | Persistent kernel: time in seconds before the kernel stops if no requests received. | | ||
| | `persistent_device_usage` | `float` | Set the fraction of maximum grid size used by persistent kernel. Value 1.0 means the kernel grid size is maximum possible for the selected device. The value must be greater than 0.0 and not greater than 1.0.<br /><br />One may need to run other kernels alongside this persistent kernel. This parameter can be used to reduce the grid size of the persistent kernel to leave a few SMs idle. Note: running any other work on GPU alongside with the persistent kernel makes the setup fragile.<br />- Running another kernel in another thread usually works, but no progress guaranteed<br />- Any CUDA allocations block the context (this issue may be obscured by using pools)<br />- Memory copies to not-pinned host memory may block the context<br /><br />Even when we know there are no other kernels working at the same time, setting kDeviceUsage to 1.0 surprisingly sometimes hurts performance. Proceed with care. If you suspect this is an issue, you can reduce this number to ~0.9 without a significant impact on the throughput. | | ||
| | `persistent_device_usage` | `float` | Set the fraction of maximum grid size used by persistent kernel. Value 1.0 means the kernel grid size is maximum possible for the selected device. The value must be greater than 0.0 and not greater than 1.0.<br /><br />One may need to run other kernels alongside this persistent kernel. This parameter can be used to reduce the grid size of the persistent kernel to leave a few SMs idle.<br />Note: running any other work on GPU alongside with the persistent kernel makes the setup fragile.<br />- Running another kernel in another thread usually works, but no progress guaranteed<br />- Any CUDA allocations block the context (this issue may be obscured by using pools)<br />- Memory copies to not-pinned host memory may block the context<br /><br />Even when we know there are no other kernels working at the same time, setting kDeviceUsage to 1.0 surprisingly sometimes hurts performance. Proceed with care. If you suspect this is an issue, you can reduce this number to ~0.9 without a significant impact on the throughput. | |
There was a problem hiding this comment.
Fix awkward phrasing in the persistent-kernel note.
“alongside with” is redundant; use either “alongside” or “with” for cleaner wording.
✏️ Suggested edit
-| `persistent_device_usage` | `float` | Set the fraction of maximum grid size used by persistent kernel. Value 1.0 means the kernel grid size is maximum possible for the selected device. The value must be greater than 0.0 and not greater than 1.0.<br /><br />One may need to run other kernels alongside this persistent kernel. This parameter can be used to reduce the grid size of the persistent kernel to leave a few SMs idle.<br />Note: running any other work on GPU alongside with the persistent kernel makes the setup fragile.<br />- Running another kernel in another thread usually works, but no progress guaranteed<br />- Any CUDA allocations block the context (this issue may be obscured by using pools)<br />- Memory copies to not-pinned host memory may block the context<br /><br />Even when we know there are no other kernels working at the same time, setting kDeviceUsage to 1.0 surprisingly sometimes hurts performance. Proceed with care. If you suspect this is an issue, you can reduce this number to ~0.9 without a significant impact on the throughput. |
+| `persistent_device_usage` | `float` | Set the fraction of maximum grid size used by persistent kernel. Value 1.0 means the kernel grid size is maximum possible for the selected device. The value must be greater than 0.0 and not greater than 1.0.<br /><br />One may need to run other kernels alongside this persistent kernel. This parameter can be used to reduce the grid size of the persistent kernel to leave a few SMs idle.<br />Note: running any other work on GPU alongside the persistent kernel makes the setup fragile.<br />- Running another kernel in another thread usually works, but no progress guaranteed<br />- Any CUDA allocations block the context (this issue may be obscured by using pools)<br />- Memory copies to not-pinned host memory may block the context<br /><br />Even when we know there are no other kernels working at the same time, setting kDeviceUsage to 1.0 surprisingly sometimes hurts performance. Proceed with care. If you suspect this is an issue, you can reduce this number to ~0.9 without a significant impact on the throughput. |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | `persistent_device_usage` | `float` | Set the fraction of maximum grid size used by persistent kernel. Value 1.0 means the kernel grid size is maximum possible for the selected device. The value must be greater than 0.0 and not greater than 1.0.<br /><br />One may need to run other kernels alongside this persistent kernel. This parameter can be used to reduce the grid size of the persistent kernel to leave a few SMs idle.<br />Note: running any other work on GPU alongside with the persistent kernel makes the setup fragile.<br />- Running another kernel in another thread usually works, but no progress guaranteed<br />- Any CUDA allocations block the context (this issue may be obscured by using pools)<br />- Memory copies to not-pinned host memory may block the context<br /><br />Even when we know there are no other kernels working at the same time, setting kDeviceUsage to 1.0 surprisingly sometimes hurts performance. Proceed with care. If you suspect this is an issue, you can reduce this number to ~0.9 without a significant impact on the throughput. | | |
| | `persistent_device_usage` | `float` | Set the fraction of maximum grid size used by persistent kernel. Value 1.0 means the kernel grid size is maximum possible for the selected device. The value must be greater than 0.0 and not greater than 1.0.<br /><br />One may need to run other kernels alongside this persistent kernel. This parameter can be used to reduce the grid size of the persistent kernel to leave a few SMs idle.<br />Note: running any other work on GPU alongside the persistent kernel makes the setup fragile.<br />- Running another kernel in another thread usually works, but no progress guaranteed<br />- Any CUDA allocations block the context (this issue may be obscured by using pools)<br />- Memory copies to not-pinned host memory may block the context<br /><br />Even when we know there are no other kernels working at the same time, setting kDeviceUsage to 1.0 surprisingly sometimes hurts performance. Proceed with care. If you suspect this is an issue, you can reduce this number to ~0.9 without a significant impact on the throughput. | |
🧰 Tools
🪛 LanguageTool
[style] ~455-~455: This phrase is redundant. Consider writing “alongside” or “with”.
Context: ...r />Note: running any other work on GPU alongside with the persistent kernel makes the setup f...
(ALONGSIDE_WITH)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@fern/pages/c_api/c-api-neighbors-cagra.md` at line 455, The note for the
persistent_device_usage/kDeviceUsage parameter contains the awkward phrase
"alongside with the persistent kernel"; edit the paragraph to use "alongside the
persistent kernel" (or "with the persistent kernel") instead, updating any other
occurrences of "alongside with" in that description so the sentence reads
cleanly (e.g., "running any other work on GPU alongside the persistent kernel
makes the setup fragile").
| <a id="cuvsscalarquantizertransform"></a> | ||
| ### cuvsScalarQuantizerTransform | ||
|
|
||
| Applies quantization transform to given dataset |
There was a problem hiding this comment.
Use a single spelling variant for “quantization” across this page.
This page mixes “quantisation” and “quantization”; please standardize to one variant for consistency.
Also applies to: 170-170
🧰 Tools
🪛 LanguageTool
[uncategorized] ~145-~145: Do not mix variants of the same word (‘quantization’ and ‘quantisation’) within a single text.
Context: ...# cuvsScalarQuantizerTransform Applies quantization transform to given dataset ```c cuvsEr...
(EN_WORD_COHERENCY)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@fern/pages/c_api/c-api-preprocessing-quantize-scalar.md` at line 145, The
page uses both “quantisation” and “quantization”; choose a single variant (use
“quantization”) and replace every occurrence on this page (including the
sentence "Applies quantization transform to given dataset" and the other
occurrence around line 170) so the terminology is consistent; search for
"quantis*" and "quantiz*" within
fern/pages/c_api/c-api-preprocessing-quantize-scalar.md and update all instances
to the chosen spelling.
| | `os` | in | `std::ostream&` | output stream | | ||
| | `index` | in | [`const cuvs::neighbors::brute_force::index<half, float>&`](/api-reference/cpp-api-neighbors-brute-force#neighbors-brute-force-index) | brute force index | | ||
| | `include_dataset` | in | `bool` | Whether or not to write out the dataset to the file. Default: `true`. | | ||
| | `include_dataset` | in | `bool` | Whether or not to write out the dataset to the file.<br />Default: `true`. | |
There was a problem hiding this comment.
Fix stream-overload description to avoid file-specific wording.
These overloads serialize to std::ostream&, but the description says “write out the dataset to the file.” Please update to stream-neutral wording (e.g., “serialized output”) and keep phrasing consistent with the other overloads.
Also applies to: 654-654
🧰 Tools
🪛 LanguageTool
[style] ~628-~628: Consider shortening this phrase to just ‘whether’, unless you mean ‘regardless of whether’.
Context: ...x | | include_dataset | in | bool | Whether or not to write out the dataset to the file.<b...
(WHETHER)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@fern/pages/cpp_api/cpp-api-neighbors-brute-force.md` at line 628, The
parameter description for `include_dataset` currently uses file-specific wording
("write out the dataset to the file") but these overloads serialize to
std::ostream&, so update both occurrences (the entries around the
`include_dataset` parameter in this document) to stream-neutral wording such as
"Whether or not to include the dataset in the serialized output." Preserve the
parameter type (`bool`) and default (`true`) and keep phrasing consistent with
other overload descriptions.
| | `include_dataset` | in | `bool` | whether or not to serialize the dataset<br />Default: `true`. | | ||
| | `sector_aligned` | in | `bool` | whether output file should be aligned to disk sectors of 4096 bytes<br />Default: `false`. | |
There was a problem hiding this comment.
Use simpler phrasing for boolean parameter descriptions.
Consider changing “whether or not …” to “whether …” for include_dataset (and optionally sector_aligned) to match concise style used elsewhere in the API docs.
Also applies to: 563-564, 591-592
🧰 Tools
🪛 LanguageTool
[style] ~535-~535: Consider shortening this phrase to just ‘whether’, unless you mean ‘regardless of whether’.
Context: ...x | | include_dataset | in | bool | whether or not to serialize the dataset
Default: ...
(WHETHER)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@fern/pages/cpp_api/cpp-api-neighbors-vamana.md` around lines 535 - 536,
Update the boolean parameter descriptions to use concise phrasing: change
"whether or not to serialize the dataset" to "whether to serialize the dataset"
for the include_dataset parameter and change "whether output file should be
aligned to disk sectors of 4096 bytes" to "whether the output file should be
aligned to 4096‑byte disk sectors" for sector_aligned; apply these edits for
every occurrence of include_dataset and sector_aligned in
cpp-api-neighbors-vamana.md so the wording matches the concise style used
elsewhere.
| | Name | Type | Description | | ||
| | --- | --- | --- | | ||
| | `metric` | `str, default = "sqeuclidean"` | String denoting the metric type, valid values for metric are ["sqeuclidean", "inner_product", "cosine"], where:<br /><br />- sqeuclidean is the euclidean distance without the square root operation, i.e.: distance(a,b) = \\sum_i (a_i - b_i)^2<br />- inner_product distance is defined as distance(a, b) = \\sum_i a_i * b_i.<br />- cosine distance is defined as distance(a, b) = 1 - \\sum_i a_i * b_i / ( \|\|a\|\|_2 * \|\|b\|\|_2). | | ||
| | `metric` | `str, default = "sqeuclidean"` | String denoting the metric type,<br />valid values for metric are ["sqeuclidean", "inner_product", "cosine"], where:<br /><br />- sqeuclidean is the euclidean distance without the square root operation, i.e.: distance(a,b) = \\sum_i (a_i - b_i)^2<br />- inner_product distance is defined as distance(a, b) = \\sum_i a_i * b_i.<br />- cosine distance is defined as distance(a, b) = 1 - \\sum_i a_i * b_i / ( \|\|a\|\|_2 * \|\|b\|\|_2). | |
There was a problem hiding this comment.
Escape the multiplication asterisk in the cosine formula to avoid MD037 lint failures.
The * in ... / ( \|\|a\|\|_2 * \|\|b\|\|_2) is being parsed as emphasis markup in this markdown context. Escaping it (or wrapping the formula in backticks) will prevent markdownlint failures.
🔧 Proposed fix
-| `metric` | `str, default = "sqeuclidean"` | String denoting the metric type,<br />valid values for metric are ["sqeuclidean", "inner_product", "cosine"], where:<br /><br />- sqeuclidean is the euclidean distance without the square root operation, i.e.: distance(a,b) = \\sum_i (a_i - b_i)^2<br />- inner_product distance is defined as distance(a, b) = \\sum_i a_i * b_i.<br />- cosine distance is defined as distance(a, b) = 1 - \\sum_i a_i * b_i / ( \|\|a\|\|_2 * \|\|b\|\|_2). |
+| `metric` | `str, default = "sqeuclidean"` | String denoting the metric type,<br />valid values for metric are ["sqeuclidean", "inner_product", "cosine"], where:<br /><br />- sqeuclidean is the euclidean distance without the square root operation, i.e.: distance(a,b) = \\sum_i (a_i - b_i)^2<br />- inner_product distance is defined as distance(a, b) = \\sum_i a_i \* b_i.<br />- cosine distance is defined as distance(a, b) = 1 - \\sum_i a_i \* b_i / ( \|\|a\|\|_2 \* \|\|b\|\|_2). |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | `metric` | `str, default = "sqeuclidean"` | String denoting the metric type,<br />valid values for metric are ["sqeuclidean", "inner_product", "cosine"], where:<br /><br />- sqeuclidean is the euclidean distance without the square root operation, i.e.: distance(a,b) = \\sum_i (a_i - b_i)^2<br />- inner_product distance is defined as distance(a, b) = \\sum_i a_i * b_i.<br />- cosine distance is defined as distance(a, b) = 1 - \\sum_i a_i * b_i / ( \|\|a\|\|_2 * \|\|b\|\|_2). | | |
| | `metric` | `str, default = "sqeuclidean"` | String denoting the metric type,<br />valid values for metric are ["sqeuclidean", "inner_product", "cosine"], where:<br /><br />- sqeuclidean is the euclidean distance without the square root operation, i.e.: distance(a,b) = \\sum_i (a_i - b_i)^2<br />- inner_product distance is defined as distance(a, b) = \\sum_i a_i \* b_i.<br />- cosine distance is defined as distance(a, b) = 1 - \\sum_i a_i \* b_i / ( \|\|a\|\|_2 \* \|\|b\|\|_2). | |
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 268-268: Spaces inside emphasis markers
(MD037, no-space-in-emphasis)
[warning] 268-268: Spaces inside emphasis markers
(MD037, no-space-in-emphasis)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@fern/pages/python_api/python-api-neighbors-cagra.md` at line 268, The
markdown for the `metric` parameter contains an unescaped asterisk in the cosine
distance formula which triggers MD037; update the cosine formula in the `metric`
description (the line referencing "cosine distance is defined as distance(a, b)
= 1 - \\sum_i a_i * b_i / ( \|\|a\|\|_2 * \|\|b\|\|_2)") so the multiplication
asterisk is escaped or wrap the entire formula in inline code/backticks; ensure
the description text for `metric` (mentioning "sqeuclidean", "inner_product",
"cosine") uses the escaped asterisk or code formatting to prevent markdownlint
failures.
| | --- | --- | --- | | ||
| | `n_lists` | `int, default = 1024` | The number of clusters used in the coarse quantizer. | | ||
| | `metric` | `str, default="sqeuclidean"` | String denoting the metric type. Valid values for metric: ["sqeuclidean", "inner_product", "euclidean", "cosine"], where:<br /><br />- sqeuclidean is the euclidean distance without the square root operation, i.e.: distance(a,b) = \\sum_i (a_i - b_i)^2,<br />- euclidean is the euclidean distance<br />- inner product distance is defined as distance(a, b) = \\sum_i a_i * b_i.<br />- cosine distance is defined as distance(a, b) = 1 - \\sum_i a_i * b_i / ( \|\|a\|\|_2 * \|\|b\|\|_2). | | ||
| | `metric` | `str, default="sqeuclidean"` | String denoting the metric type.<br />Valid values for metric: ["sqeuclidean", "inner_product", "euclidean", "cosine"], where:<br /><br />- sqeuclidean is the euclidean distance without the square root operation, i.e.: distance(a,b) = \\sum_i (a_i - b_i)^2,<br />- euclidean is the euclidean distance<br />- inner product distance is defined as distance(a, b) = \\sum_i a_i * b_i.<br />- cosine distance is defined as distance(a, b) = 1 - \\sum_i a_i * b_i / ( \|\|a\|\|_2 * \|\|b\|\|_2). | |
There was a problem hiding this comment.
Fix markdown emphasis parsing in the cosine formula.
Unescaped * in this table cell can trip MD037. Escaping the multiplication symbols (or wrapping formula text in code) will avoid lint failures.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 201-201: Spaces inside emphasis markers
(MD037, no-space-in-emphasis)
[warning] 201-201: Spaces inside emphasis markers
(MD037, no-space-in-emphasis)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@fern/pages/python_api/python-api-neighbors-ivf-pq.md` at line 201, The
markdown table cell for the `metric` parameter contains unescaped asterisks in
the cosine distance formula which breaks MD parsing; update the cosine formula
in the `metric` description (the line mentioning "cosine distance is defined
as...") to either escape the multiplication asterisks or wrap the entire formula
in inline code/backticks so the "*" characters are not treated as emphasis (keep
the rest of the text and valid metric list unchanged).
| | `lut_dtype` | `default = np.float32` | Data type of look up table to be created dynamically at search time. The use of low-precision types reduces the amount of shared memory required at search time, so fast shared memory kernels can be used even for datasets with large dimansionality. Note that the recall is slightly degraded when low-precision type is selected. Possible values [np.float32, np.float16, np.uint8] | | ||
| | `internal_distance_dtype` | `default = np.float32` | Storage data type for distance/similarity computation. Possible values [np.float32, np.float16] | | ||
| | `coarse_search_dtype` | `default = np.float32` | [Experimental] The data type to use as the GEMM element type when searching the clusters to probe. Possible values: [np.float32, np.float16, np.int8].<br />- Legacy default: np.float32<br />- Recommended for performance: np.float16 (half)<br />- Experimental/low-precision: np.int8 | | ||
| | `lut_dtype` | `default = np.float32` | Data type of look up table to be created dynamically at search time. The use of low-precision types reduces the amount of shared memory required at search time, so fast shared memory kernels can be used even for datasets with large dimansionality. Note that the recall is slightly degraded when low-precision type is selected.<br />Possible values [np.float32, np.float16, np.uint8] | |
There was a problem hiding this comment.
Correct typo in parameter description (dimansionality → dimensionality).
Small wording fix, but it improves docs quality and avoids grammar/lint noise.
✏️ Proposed fix
-| `lut_dtype` | `default = np.float32` | Data type of look up table to be created dynamically at search time. The use of low-precision types reduces the amount of shared memory required at search time, so fast shared memory kernels can be used even for datasets with large dimansionality. Note that the recall is slightly degraded when low-precision type is selected.<br />Possible values [np.float32, np.float16, np.uint8] |
+| `lut_dtype` | `default = np.float32` | Data type of look up table to be created dynamically at search time. The use of low-precision types reduces the amount of shared memory required at search time, so fast shared memory kernels can be used even for datasets with large dimensionality. Note that the recall is slightly degraded when low-precision type is selected.<br />Possible values [np.float32, np.float16, np.uint8] |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | `lut_dtype` | `default = np.float32` | Data type of look up table to be created dynamically at search time. The use of low-precision types reduces the amount of shared memory required at search time, so fast shared memory kernels can be used even for datasets with large dimansionality. Note that the recall is slightly degraded when low-precision type is selected.<br />Possible values [np.float32, np.float16, np.uint8] | | |
| | `lut_dtype` | `default = np.float32` | Data type of look up table to be created dynamically at search time. The use of low-precision types reduces the amount of shared memory required at search time, so fast shared memory kernels can be used even for datasets with large dimensionality. Note that the recall is slightly degraded when low-precision type is selected.<br />Possible values [np.float32, np.float16, np.uint8] | |
🧰 Tools
🪛 LanguageTool
[grammar] ~349-~349: Ensure spelling is correct
Context: ...an be used even for datasets with large dimansionality. Note that the recall is slightly degra...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@fern/pages/python_api/python-api-neighbors-ivf-pq.md` at line 349, Fix the
typo in the `lut_dtype` parameter description: change "dimansionality" to
"dimensionality" in the sentence that reads "so fast shared memory kernels can
be used even for datasets with large dimansionality." Update the documentation
string for `lut_dtype` accordingly so it reads "dimensionality" and keep the
rest of the phrasing unchanged.
| | --- | --- | --- | | ||
| | `n_lists` | `int, default = 1024` | The number of clusters used in the coarse quantizer. | | ||
| | `metric` | `str, default = "sqeuclidean"` | String denoting the metric type. Valid values for metric: ["sqeuclidean", "inner_product", "euclidean", "cosine"], where<br /><br />- sqeuclidean is the euclidean distance without the square root operation, i.e.: distance(a,b) = \\sum_i (a_i - b_i)^2,<br />- euclidean is the euclidean distance<br />- inner product distance is defined as distance(a, b) = \\sum_i a_i * b_i.<br />- cosine distance is defined as distance(a, b) = 1 - \\sum_i a_i * b_i / ( \|\|a\|\|_2 * \|\|b\|\|_2). | | ||
| | `metric` | `str, default = "sqeuclidean"` | String denoting the metric type.<br />Valid values for metric: ["sqeuclidean", "inner_product", "euclidean", "cosine"], where<br /><br />- sqeuclidean is the euclidean distance without the square root operation, i.e.: distance(a,b) = \\sum_i (a_i - b_i)^2,<br />- euclidean is the euclidean distance<br />- inner product distance is defined as distance(a, b) = \\sum_i a_i * b_i.<br />- cosine distance is defined as distance(a, b) = 1 - \\sum_i a_i * b_i / ( \|\|a\|\|_2 * \|\|b\|\|_2). | |
There was a problem hiding this comment.
Escape formula multiplication operators to prevent MD037 in table markdown.
This metric description still uses raw * characters in prose math; markdownlint may treat them as emphasis markers.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 74-74: Spaces inside emphasis markers
(MD037, no-space-in-emphasis)
[warning] 74-74: Spaces inside emphasis markers
(MD037, no-space-in-emphasis)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@fern/pages/python_api/python-api-neighbors-ivf-sq.md` at line 74, The table
cell describing `metric` contains raw asterisks in the inline formulas (e.g.,
"a_i * b_i" and "\|\|a\|\|_2 * \|\|b\|\|_2") which triggers MD037; update the
markdown in the `metric` description to escape multiplication operators by
replacing * with \* in those expressions (for example "a_i \* b_i" and
"\|\|a\|\|_2 \* \|\|b\|\|_2") so the table renders correctly and avoids markdown
lint errors.
| | Name | Type | Description | | ||
| | --- | --- | --- | | ||
| | `metric` | `str, default = "sqeuclidean"` | String denoting the metric type. Valid values for metric: ["sqeuclidean", "inner_product", "euclidean", "cosine"], where<br />- sqeuclidean is the euclidean distance without the square root operation, i.e.: distance(a,b) = \\sum_i (a_i - b_i)^2,<br />- euclidean is the euclidean distance<br />- inner product distance is defined as distance(a, b) = \\sum_i a_i * b_i.<br />- cosine distance is defined as distance(a, b) = 1 - \\sum_i a_i * b_i / ( \|\|a\|\|_2 * \|\|b\|\|_2). | | ||
| | `metric` | `str, default = "sqeuclidean"` | String denoting the metric type.<br />Valid values for metric: ["sqeuclidean", "inner_product", "euclidean", "cosine"], where<br />- sqeuclidean is the euclidean distance without the square root operation, i.e.: distance(a,b) = \\sum_i (a_i - b_i)^2,<br />- euclidean is the euclidean distance<br />- inner product distance is defined as distance(a, b) = \\sum_i a_i * b_i.<br />- cosine distance is defined as distance(a, b) = 1 - \\sum_i a_i * b_i / ( \|\|a\|\|_2 * \|\|b\|\|_2). | |
There was a problem hiding this comment.
Avoid markdown emphasis conflicts in formula text.
Unescaped * in this metric formula can trigger MD037. Escaping the operator or using inline code will keep lint clean.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 41-41: Spaces inside emphasis markers
(MD037, no-space-in-emphasis)
[warning] 41-41: Spaces inside emphasis markers
(MD037, no-space-in-emphasis)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@fern/pages/python_api/python-api-neighbors-tiered-index.md` at line 41, The
metric description contains unescaped asterisks in the formulas (e.g., in "inner
product" and "cosine" definitions) which triggers markdown lint MD037; update
the text for the `metric` parameter to either wrap mathematical
operators/expressions like a_i * b_i and norms (||a||_2) in inline code spans or
escape the `*` operators so they are not treated as Markdown emphasis —
specifically edit the `metric` description line that mentions "inner product
distance" and "cosine distance" (and the sqeuclidean formula if needed) to use
inline code for expressions such as a_i * b_i and `||a||_2` or escape the `*`
characters.
| ## References | ||
|
|
||
| - [References](references.md): cite the research papers behind cuVS vector search, preprocessing, clustering, and GPU primitives. | ||
| - [API Guide](/user-guide/api-guides): find task-focused NVIDIA cuVS API examples for clustering, vector indexing, preprocessing, common types, and supporting routines. |
There was a problem hiding this comment.
Fix likely broken API Guide link target.
Line 11 uses /user-guide/api-guides, but the configured page is api_guide.md (singular “api-guide”), so this URL likely 404s.
Suggested fix
-- [API Guide](/user-guide/api-guides): find task-focused NVIDIA cuVS API examples for clustering, vector indexing, preprocessing, common types, and supporting routines.
+- [API Guide](/user-guide/api-guide): find task-focused NVIDIA cuVS API examples for clustering, vector indexing, preprocessing, common types, and supporting routines.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - [API Guide](/user-guide/api-guides): find task-focused NVIDIA cuVS API examples for clustering, vector indexing, preprocessing, common types, and supporting routines. | |
| - [API Guide](/user-guide/api-guide): find task-focused NVIDIA cuVS API examples for clustering, vector indexing, preprocessing, common types, and supporting routines. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@fern/pages/user_guide.md` at line 11, Update the broken markdown link "[API
Guide](/user-guide/api-guides)" in fern/pages/user_guide.md to point to the
correct configured page name (singular) — replace the URL segment
"/user-guide/api-guides" with "/user-guide/api-guide" (matching the configured
file name api_guide.md) so the link resolves correctly.
|
/merge |
No description provided.