docs: sync 26.05 docs/docs with main#2179
Conversation
Greptile SummaryThis PR syncs
|
| Filename | Overview |
|---|---|
| docs/docs/extraction/audio-video.md | Removal of !!! important admonition leaves GPU-pinning note as a code block; stray ) produces SyntaxError in code sample; two near-duplicate segment_audio paragraphs with conflicting API names. |
| docs/docs/extraction/custom-metadata.md | New TOC references 6 non-existent section anchors; How metadata is stored heading renamed without updating content; variable definitions removed but still referenced in code example. |
| docs/docs/extraction/releasenotes.md | RC1 install boilerplate fully replaced with GA 26.05 release notes. |
| docs/docs/extraction/prerequisites-support-matrix.md | CUDA/driver requirements updated; OCR NIM corrected; Nemotron Parse extra documented; caption-scope note added. |
| docs/mkdocs.yml | Nav and redirect updated for notebooks/index.md; exclude_docs pattern fixed. |
Prompt To Fix All With AI
Fix the following 6 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 6
docs/docs/extraction/audio-video.md:66-68
**GPU pinning note silently rendered as a code block**
After the `!!! important` admonition was removed, the paragraph at line 68 (`Pin the Parakeet workload…`) is now indented by 4 spaces with no enclosing list item. In Markdown (including MkDocs Material), a 4-space-indented paragraph outside a list context is treated as an **indented code block**, so this critical deployment warning will render as `<pre><code>` text rather than readable prose — readers following the setup steps will miss the GPU pinning requirement entirely.
### Issue 2 of 6
docs/docs/extraction/audio-video.md:88-91
**Stray `)` produces a `SyntaxError` in the code sample**
The closing `)` at line 90 is placed outside the code fence, making it part of the rendered code content. The code block therefore ends with two consecutive `)` characters — one closing `extract_audio(...)` and an extra one below `ingestor = (...)`. Anyone copying this snippet will get a `SyntaxError` immediately.
```suggestion
)
)
```
```
### Issue 3 of 6
docs/docs/extraction/audio-video.md:93-97
**Duplicate near-identical `segment_audio` paragraphs with conflicting API names**
Line 93 (unindented) says to use `extract_audio_params={"segment_audio": True}` with `.extract(...)`, while line 95 (indented continuation of step 3) says to use `asr_params=ASRParams(segment_audio=True)` with `.extract_audio(...)`. These look like two different API call styles that both appeared after the admonition block was removed. One of them should be removed, or it should be clarified which applies to library mode vs. the service ingestor.
### Issue 4 of 6
docs/docs/extraction/custom-metadata.md:40-42
**Undefined variables make the code example un-runnable**
The diff removes the `hostname`, `table_name`, and `lancedb_uri` variable definitions that previously preceded the `ingestor = (...)` block, but the `create_ingestor(...)` call still references all three. Copying this snippet results in a `NameError` on `hostname`. The variable definitions need to be restored.
```suggestion
hostname = "localhost"
table_name = "nemo_retriever_collection"
lancedb_uri = "./lancedb_data"
ingestor = (
create_ingestor(run_mode="service", base_url=f"http://{hostname}:7670")
.files(["data/woods_frost.pdf", "data/multimodal_test.pdf"])
```
### Issue 5 of 6
docs/docs/extraction/custom-metadata.md:5-14
**"On this page" TOC contains 6 broken anchor links**
The table of contents added in this PR references `#filter-results-at-query-time`, `#writing-where-predicates`, `#server-side-vs-client-side-filters`, `#inspect-hit-metadata`, `#limitations`, and `#related-content`. None of these section headings exist in the current file body (128 lines). The body still contains the old 26.05 structure (`## Best Practices`, `## Use Custom Metadata to Filter Results During Retrieval`, etc.) rather than the restructured sections the TOC was written for. Clicking any of these six links in the published docs will silently scroll to the top of the page.
### Issue 6 of 6
docs/docs/extraction/custom-metadata.md:125-128
**Section heading "How metadata is stored" contains only cross-reference bullets**
The heading at line 125 was renamed from `## Related Content` to `## How metadata is stored`, but its body was not updated — it still contains just two reference links. Readers navigating to this section via the TOC will find no explanation of how metadata is persisted (e.g., serialized into the `metadata` column, how `content_metadata` fields are mapped). Either restore the "Related content" heading or replace the bullets with the intended storage explanation.
Reviews (1): Last reviewed commit: "docs: sync 26.05 docs/docs with main" | Re-trigger Greptile
| After deploy, call the pipeline from Python: | ||
|
|
||
| Pin the Parakeet workload to the dedicated GPU with your Helm values or the [NIM Operator](https://docs.nvidia.com/nim-operator/latest/index.html) (for example, node selectors, resource limits, or device requests appropriate to your cluster). |
There was a problem hiding this comment.
GPU pinning note silently rendered as a code block
After the !!! important admonition was removed, the paragraph at line 68 (Pin the Parakeet workload…) is now indented by 4 spaces with no enclosing list item. In Markdown (including MkDocs Material), a 4-space-indented paragraph outside a list context is treated as an indented code block, so this critical deployment warning will render as <pre><code> text rather than readable prose — readers following the setup steps will miss the GPU pinning requirement entirely.
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/audio-video.md
Line: 66-68
Comment:
**GPU pinning note silently rendered as a code block**
After the `!!! important` admonition was removed, the paragraph at line 68 (`Pin the Parakeet workload…`) is now indented by 4 spaces with no enclosing list item. In Markdown (including MkDocs Material), a 4-space-indented paragraph outside a list context is treated as an **indented code block**, so this critical deployment warning will render as `<pre><code>` text rather than readable prose — readers following the setup steps will miss the GPU pinning requirement entirely.
How can I resolve this? If you propose a fix, please make it concise.| ) | ||
| ) | ||
| ``` | ||
| ) | ||
| ``` |
There was a problem hiding this comment.
Stray
) produces a SyntaxError in the code sample
The closing ) at line 90 is placed outside the code fence, making it part of the rendered code content. The code block therefore ends with two consecutive ) characters — one closing extract_audio(...) and an extra one below ingestor = (...). Anyone copying this snippet will get a SyntaxError immediately.
| ) | |
| ) | |
| ``` | |
| ) | |
| ``` | |
| ) | |
| ) |
<details><summary>Prompt To Fix With AI</summary>
`````markdown
This is a comment left during a code review.
Path: docs/docs/extraction/audio-video.md
Line: 88-91
Comment:
**Stray `)` produces a `SyntaxError` in the code sample**
The closing `)` at line 90 is placed outside the code fence, making it part of the rendered code content. The code block therefore ends with two consecutive `)` characters — one closing `extract_audio(...)` and an extra one below `ingestor = (...)`. Anyone copying this snippet will get a `SyntaxError` immediately.
```suggestion
)
)
How can I resolve this? If you propose a fix, please make it concise.
| To generate one extracted element for each sentence-like ASR segment, include `extract_audio_params={"segment_audio": True}` when calling `.extract(...)`. This option applies when audio extraction runs with a self-hosted Parakeet NIM or using build.nvidia.com hosted inference, but has no effect when using the local Hugging Face Parakeet model. | ||
|
|
||
| To generate one extracted element for each sentence-like ASR segment, pass `asr_params=ASRParams(segment_audio=True)` to `.extract_audio(...)`. This option applies when audio extraction runs with a self-hosted Parakeet NIM or using build.nvidia.com hosted inference, but has no effect when using the local Hugging Face Parakeet model. | ||
|
|
||
|
|
||
| !!! tip | ||
|
|
||
| For more Python examples, refer to [Python Quick Start Guide](https://github.com/NVIDIA/NeMo-Retriever/blob/main/client/client_examples/examples/python_client_usage.ipynb). | ||
| For more Python examples, refer to [Python Quick Start Guide](https://github.com/NVIDIA/NeMo-Retriever/blob/main/client/client_examples/examples/python_client_usage.ipynb). |
There was a problem hiding this comment.
Duplicate near-identical
segment_audio paragraphs with conflicting API names
Line 93 (unindented) says to use extract_audio_params={"segment_audio": True} with .extract(...), while line 95 (indented continuation of step 3) says to use asr_params=ASRParams(segment_audio=True) with .extract_audio(...). These look like two different API call styles that both appeared after the admonition block was removed. One of them should be removed, or it should be clarified which applies to library mode vs. the service ingestor.
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/audio-video.md
Line: 93-97
Comment:
**Duplicate near-identical `segment_audio` paragraphs with conflicting API names**
Line 93 (unindented) says to use `extract_audio_params={"segment_audio": True}` with `.extract(...)`, while line 95 (indented continuation of step 3) says to use `asr_params=ASRParams(segment_audio=True)` with `.extract_audio(...)`. These look like two different API call styles that both appeared after the admonition block was removed. One of them should be removed, or it should be clarified which applies to library mode vs. the service ingestor.
How can I resolve this? If you propose a fix, please make it concise.| ingestor = ( | ||
| create_ingestor(run_mode="service", base_url=f"http://{hostname}:7670") | ||
| .files(["data/woods_frost.pdf", "data/multimodal_test.pdf"]) |
There was a problem hiding this comment.
Undefined variables make the code example un-runnable
The diff removes the hostname, table_name, and lancedb_uri variable definitions that previously preceded the ingestor = (...) block, but the create_ingestor(...) call still references all three. Copying this snippet results in a NameError on hostname. The variable definitions need to be restored.
| ingestor = ( | |
| create_ingestor(run_mode="service", base_url=f"http://{hostname}:7670") | |
| .files(["data/woods_frost.pdf", "data/multimodal_test.pdf"]) | |
| hostname = "localhost" | |
| table_name = "nemo_retriever_collection" | |
| lancedb_uri = "./lancedb_data" | |
| ingestor = ( | |
| create_ingestor(run_mode="service", base_url=f"http://{hostname}:7670") | |
| .files(["data/woods_frost.pdf", "data/multimodal_test.pdf"]) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/custom-metadata.md
Line: 40-42
Comment:
**Undefined variables make the code example un-runnable**
The diff removes the `hostname`, `table_name`, and `lancedb_uri` variable definitions that previously preceded the `ingestor = (...)` block, but the `create_ingestor(...)` call still references all three. Copying this snippet results in a `NameError` on `hostname`. The variable definitions need to be restored.
```suggestion
hostname = "localhost"
table_name = "nemo_retriever_collection"
lancedb_uri = "./lancedb_data"
ingestor = (
create_ingestor(run_mode="service", base_url=f"http://{hostname}:7670")
.files(["data/woods_frost.pdf", "data/multimodal_test.pdf"])
```
How can I resolve this? If you propose a fix, please make it concise.| ## On this page { #on-this-page } | ||
|
|
||
| - [Attach metadata at ingestion](#attach-metadata-at-ingestion) | ||
| - [How metadata is stored](#how-metadata-is-stored) | ||
| - [Filter results at query time](#filter-results-at-query-time) | ||
| - [Writing `where` predicates](#writing-where-predicates) | ||
| - [Server-side vs client-side filters](#server-side-vs-client-side-filters) | ||
| - [Inspect hit metadata](#inspect-hit-metadata) | ||
| - [Limitations](#limitations) | ||
| - [Related content](#related-content) |
There was a problem hiding this comment.
"On this page" TOC contains 6 broken anchor links
The table of contents added in this PR references #filter-results-at-query-time, #writing-where-predicates, #server-side-vs-client-side-filters, #inspect-hit-metadata, #limitations, and #related-content. None of these section headings exist in the current file body (128 lines). The body still contains the old 26.05 structure (## Best Practices, ## Use Custom Metadata to Filter Results During Retrieval, etc.) rather than the restructured sections the TOC was written for. Clicking any of these six links in the published docs will silently scroll to the top of the page.
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/custom-metadata.md
Line: 5-14
Comment:
**"On this page" TOC contains 6 broken anchor links**
The table of contents added in this PR references `#filter-results-at-query-time`, `#writing-where-predicates`, `#server-side-vs-client-side-filters`, `#inspect-hit-metadata`, `#limitations`, and `#related-content`. None of these section headings exist in the current file body (128 lines). The body still contains the old 26.05 structure (`## Best Practices`, `## Use Custom Metadata to Filter Results During Retrieval`, etc.) rather than the restructured sections the TOC was written for. Clicking any of these six links in the published docs will silently scroll to the top of the page.
How can I resolve this? If you propose a fix, please make it concise.| ## How metadata is stored { #how-metadata-is-stored } | ||
|
|
||
| - [Vector databases](vdbs.md) — canonical LanceDB upload and retrieval guide | ||
| - [metadata_and_filtered_search.ipynb](https://github.com/NVIDIA/NeMo-Retriever/blob/main/examples/metadata_and_filtered_search.ipynb) — CLI and graph ingest with sidecar metadata |
There was a problem hiding this comment.
Section heading "How metadata is stored" contains only cross-reference bullets
The heading at line 125 was renamed from ## Related Content to ## How metadata is stored, but its body was not updated — it still contains just two reference links. Readers navigating to this section via the TOC will find no explanation of how metadata is persisted (e.g., serialized into the metadata column, how content_metadata fields are mapped). Either restore the "Related content" heading or replace the bullets with the intended storage explanation.
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/custom-metadata.md
Line: 125-128
Comment:
**Section heading "How metadata is stored" contains only cross-reference bullets**
The heading at line 125 was renamed from `## Related Content` to `## How metadata is stored`, but its body was not updated — it still contains just two reference links. Readers navigating to this section via the TOC will find no explanation of how metadata is persisted (e.g., serialized into the `metadata` column, how `content_metadata` fields are mapped). Either restore the "Related content" heading or replace the bullets with the intended storage explanation.
How can I resolve this? If you propose a fix, please make it concise.
Summary
docs/docs/onmainand26.05differ in 13 extraction pages plusdocs/mkdocs.ymlnav/redirects.mainis authoritative — it has the GA 26.05 release notes, updated support matrix (CUDA 13.0 / driver 580, Nemotron Parse extra), caption-scope FAQ, andopen_cliptroubleshooting.26.05branch sodocs/docs/matchesmainexactly (git diff upstream/main -- docs/docs/is empty on this branch).Notable content restored on 26.05
releasenotes.md: Full GA 26.05 highlights (upgrade notes, pipeline, CLI, service, models, multimodal, RAG, VDB, evaluation, packaging, Helm, documentation) instead of RC1 install boilerplateprerequisites-support-matrix.md: Current CUDA/driver requirements and Nemotron Parse dependency notefaq.md/troubleshoot.md: Caption scope FAQ andopen_clipinstall guidancecustom-metadata.md: Restructured filtering doc from mainnotebooks/index.md: Restored main nav path (with matchingmkdocs.ymlredirect)Test plan
git diff upstream/main -- docs/docs/is empty on this branch26.05succeeds with updated nav