Skip to content

docs: sync 26.05 docs/docs with main#2179

Open
kheiss-uwzoo wants to merge 1 commit into
NVIDIA:26.05from
kheiss-uwzoo:docs/sync-26.05-docs-with-main
Open

docs: sync 26.05 docs/docs with main#2179
kheiss-uwzoo wants to merge 1 commit into
NVIDIA:26.05from
kheiss-uwzoo:docs/sync-26.05-docs-with-main

Conversation

@kheiss-uwzoo
Copy link
Copy Markdown
Collaborator

@kheiss-uwzoo kheiss-uwzoo commented May 30, 2026

Summary

  • Audit result: docs/docs/ on main and 26.05 differ in 13 extraction pages plus docs/mkdocs.yml nav/redirects. main is authoritative — it has the GA 26.05 release notes, updated support matrix (CUDA 13.0 / driver 580, Nemotron Parse extra), caption-scope FAQ, and open_clip troubleshooting.
  • This PR updates the 26.05 branch so docs/docs/ matches main exactly (git diff upstream/main -- docs/docs/ is empty on this branch).

Notable content restored on 26.05

  • releasenotes.md: Full GA 26.05 highlights (upgrade notes, pipeline, CLI, service, models, multimodal, RAG, VDB, evaluation, packaging, Helm, documentation) instead of RC1 install boilerplate
  • prerequisites-support-matrix.md: Current CUDA/driver requirements and Nemotron Parse dependency note
  • faq.md / troubleshoot.md: Caption scope FAQ and open_clip install guidance
  • custom-metadata.md: Restructured filtering doc from main
  • notebooks/index.md: Restored main nav path (with matching mkdocs.yml redirect)

Test plan

  • git diff upstream/main -- docs/docs/ is empty on this branch
  • MkDocs build on 26.05 succeeds with updated nav
  • Release notes page shows GA content, not RC1 install-only text

@kheiss-uwzoo kheiss-uwzoo requested review from a team as code owners May 30, 2026 13:45
@kheiss-uwzoo kheiss-uwzoo requested review from jioffe502 and removed request for a team May 30, 2026 13:45
@kheiss-uwzoo kheiss-uwzoo added the doc Improvements or additions to documentation label May 30, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 30, 2026

Greptile Summary

This PR syncs docs/docs/ on the 26.05 branch with main, replacing RC1 install boilerplate with GA release notes and carrying over content updates including updated CUDA/driver requirements (12.2/535 → 13.0/580), OCR NIM clarifications, Nemotron Parse dependency docs, a new chart-captioning FAQ, and open_clip troubleshooting. Most of the 14 files are clean, but two files — audio-video.md and custom-metadata.md — have defects introduced during the sync that will break the published documentation.

  • audio-video.md: The removal of an !!! important admonition left a critical GPU-pinning note 4-space-indented outside a list (renders as a code block), and the code fence restructuring inserted a stray ) that produces a Python SyntaxError in the copyable example; two near-duplicate segment_audio paragraphs also appeared.
  • custom-metadata.md: The new "On this page" TOC references 6 section anchors that don't exist in the document body; the ## How metadata is stored heading was renamed from "Related Content" without updating its content; and variable definitions (hostname, lancedb_uri, table_name) were removed but are still referenced in the ingestor code example, causing a NameError.

Confidence Score: 3/5

Not safe to merge as-is: two files have doc defects that will ship broken code examples and broken navigation to 26.05 users.

The majority of files in this sync are clean and accurate, but audio-video.md ships a Python SyntaxError in a copyable code block and hides a critical GPU-pinning deployment note as a code block. custom-metadata.md ships an ingestor snippet that throws NameError on first run and an On this page TOC with six dead anchor links. These are visible, immediately reproducible defects in the published documentation that will affect users following the 26.05 setup guides.

docs/docs/extraction/audio-video.md and docs/docs/extraction/custom-metadata.md both need fixes before merge; all other files look correct.

Important Files Changed

Filename Overview
docs/docs/extraction/audio-video.md Removal of !!! important admonition leaves GPU-pinning note as a code block; stray ) produces SyntaxError in code sample; two near-duplicate segment_audio paragraphs with conflicting API names.
docs/docs/extraction/custom-metadata.md New TOC references 6 non-existent section anchors; How metadata is stored heading renamed without updating content; variable definitions removed but still referenced in code example.
docs/docs/extraction/releasenotes.md RC1 install boilerplate fully replaced with GA 26.05 release notes.
docs/docs/extraction/prerequisites-support-matrix.md CUDA/driver requirements updated; OCR NIM corrected; Nemotron Parse extra documented; caption-scope note added.
docs/mkdocs.yml Nav and redirect updated for notebooks/index.md; exclude_docs pattern fixed.
Prompt To Fix All With AI
Fix the following 6 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 6
docs/docs/extraction/audio-video.md:66-68
**GPU pinning note silently rendered as a code block**

After the `!!! important` admonition was removed, the paragraph at line 68 (`Pin the Parakeet workload…`) is now indented by 4 spaces with no enclosing list item. In Markdown (including MkDocs Material), a 4-space-indented paragraph outside a list context is treated as an **indented code block**, so this critical deployment warning will render as `<pre><code>` text rather than readable prose — readers following the setup steps will miss the GPU pinning requirement entirely.

### Issue 2 of 6
docs/docs/extraction/audio-video.md:88-91
**Stray `)` produces a `SyntaxError` in the code sample**

The closing `)` at line 90 is placed outside the code fence, making it part of the rendered code content. The code block therefore ends with two consecutive `)` characters — one closing `extract_audio(...)` and an extra one below `ingestor = (...)`. Anyone copying this snippet will get a `SyntaxError` immediately.

```suggestion
        )
    )
```
```

### Issue 3 of 6
docs/docs/extraction/audio-video.md:93-97
**Duplicate near-identical `segment_audio` paragraphs with conflicting API names**

Line 93 (unindented) says to use `extract_audio_params={"segment_audio": True}` with `.extract(...)`, while line 95 (indented continuation of step 3) says to use `asr_params=ASRParams(segment_audio=True)` with `.extract_audio(...)`. These look like two different API call styles that both appeared after the admonition block was removed. One of them should be removed, or it should be clarified which applies to library mode vs. the service ingestor.

### Issue 4 of 6
docs/docs/extraction/custom-metadata.md:40-42
**Undefined variables make the code example un-runnable**

The diff removes the `hostname`, `table_name`, and `lancedb_uri` variable definitions that previously preceded the `ingestor = (...)` block, but the `create_ingestor(...)` call still references all three. Copying this snippet results in a `NameError` on `hostname`. The variable definitions need to be restored.

```suggestion
hostname = "localhost"
table_name = "nemo_retriever_collection"
lancedb_uri = "./lancedb_data"

ingestor = (
    create_ingestor(run_mode="service", base_url=f"http://{hostname}:7670")
        .files(["data/woods_frost.pdf", "data/multimodal_test.pdf"])
```

### Issue 5 of 6
docs/docs/extraction/custom-metadata.md:5-14
**"On this page" TOC contains 6 broken anchor links**

The table of contents added in this PR references `#filter-results-at-query-time`, `#writing-where-predicates`, `#server-side-vs-client-side-filters`, `#inspect-hit-metadata`, `#limitations`, and `#related-content`. None of these section headings exist in the current file body (128 lines). The body still contains the old 26.05 structure (`## Best Practices`, `## Use Custom Metadata to Filter Results During Retrieval`, etc.) rather than the restructured sections the TOC was written for. Clicking any of these six links in the published docs will silently scroll to the top of the page.

### Issue 6 of 6
docs/docs/extraction/custom-metadata.md:125-128
**Section heading "How metadata is stored" contains only cross-reference bullets**

The heading at line 125 was renamed from `## Related Content` to `## How metadata is stored`, but its body was not updated — it still contains just two reference links. Readers navigating to this section via the TOC will find no explanation of how metadata is persisted (e.g., serialized into the `metadata` column, how `content_metadata` fields are mapped). Either restore the "Related content" heading or replace the bullets with the intended storage explanation.

Reviews (1): Last reviewed commit: "docs: sync 26.05 docs/docs with main" | Re-trigger Greptile

Comment on lines +66 to 68
After deploy, call the pipeline from Python:

Pin the Parakeet workload to the dedicated GPU with your Helm values or the [NIM Operator](https://docs.nvidia.com/nim-operator/latest/index.html) (for example, node selectors, resource limits, or device requests appropriate to your cluster).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 GPU pinning note silently rendered as a code block

After the !!! important admonition was removed, the paragraph at line 68 (Pin the Parakeet workload…) is now indented by 4 spaces with no enclosing list item. In Markdown (including MkDocs Material), a 4-space-indented paragraph outside a list context is treated as an indented code block, so this critical deployment warning will render as <pre><code> text rather than readable prose — readers following the setup steps will miss the GPU pinning requirement entirely.

Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/audio-video.md
Line: 66-68

Comment:
**GPU pinning note silently rendered as a code block**

After the `!!! important` admonition was removed, the paragraph at line 68 (`Pin the Parakeet workload…`) is now indented by 4 spaces with no enclosing list item. In Markdown (including MkDocs Material), a 4-space-indented paragraph outside a list context is treated as an **indented code block**, so this critical deployment warning will render as `<pre><code>` text rather than readable prose — readers following the setup steps will miss the GPU pinning requirement entirely.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 88 to +91
)
)
```
)
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Stray ) produces a SyntaxError in the code sample

The closing ) at line 90 is placed outside the code fence, making it part of the rendered code content. The code block therefore ends with two consecutive ) characters — one closing extract_audio(...) and an extra one below ingestor = (...). Anyone copying this snippet will get a SyntaxError immediately.

Suggested change
)
)
```
)
```
)
)

<details><summary>Prompt To Fix With AI</summary>

`````markdown
This is a comment left during a code review.
Path: docs/docs/extraction/audio-video.md
Line: 88-91

Comment:
**Stray `)` produces a `SyntaxError` in the code sample**

The closing `)` at line 90 is placed outside the code fence, making it part of the rendered code content. The code block therefore ends with two consecutive `)` characters — one closing `extract_audio(...)` and an extra one below `ingestor = (...)`. Anyone copying this snippet will get a `SyntaxError` immediately.

```suggestion
        )
    )

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +93 to +97
To generate one extracted element for each sentence-like ASR segment, include `extract_audio_params={"segment_audio": True}` when calling `.extract(...)`. This option applies when audio extraction runs with a self-hosted Parakeet NIM or using build.nvidia.com hosted inference, but has no effect when using the local Hugging Face Parakeet model.

To generate one extracted element for each sentence-like ASR segment, pass `asr_params=ASRParams(segment_audio=True)` to `.extract_audio(...)`. This option applies when audio extraction runs with a self-hosted Parakeet NIM or using build.nvidia.com hosted inference, but has no effect when using the local Hugging Face Parakeet model.


!!! tip

For more Python examples, refer to [Python Quick Start Guide](https://github.com/NVIDIA/NeMo-Retriever/blob/main/client/client_examples/examples/python_client_usage.ipynb).
For more Python examples, refer to [Python Quick Start Guide](https://github.com/NVIDIA/NeMo-Retriever/blob/main/client/client_examples/examples/python_client_usage.ipynb).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Duplicate near-identical segment_audio paragraphs with conflicting API names

Line 93 (unindented) says to use extract_audio_params={"segment_audio": True} with .extract(...), while line 95 (indented continuation of step 3) says to use asr_params=ASRParams(segment_audio=True) with .extract_audio(...). These look like two different API call styles that both appeared after the admonition block was removed. One of them should be removed, or it should be clarified which applies to library mode vs. the service ingestor.

Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/audio-video.md
Line: 93-97

Comment:
**Duplicate near-identical `segment_audio` paragraphs with conflicting API names**

Line 93 (unindented) says to use `extract_audio_params={"segment_audio": True}` with `.extract(...)`, while line 95 (indented continuation of step 3) says to use `asr_params=ASRParams(segment_audio=True)` with `.extract_audio(...)`. These look like two different API call styles that both appeared after the admonition block was removed. One of them should be removed, or it should be clarified which applies to library mode vs. the service ingestor.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 40 to 42
ingestor = (
create_ingestor(run_mode="service", base_url=f"http://{hostname}:7670")
.files(["data/woods_frost.pdf", "data/multimodal_test.pdf"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Undefined variables make the code example un-runnable

The diff removes the hostname, table_name, and lancedb_uri variable definitions that previously preceded the ingestor = (...) block, but the create_ingestor(...) call still references all three. Copying this snippet results in a NameError on hostname. The variable definitions need to be restored.

Suggested change
ingestor = (
create_ingestor(run_mode="service", base_url=f"http://{hostname}:7670")
.files(["data/woods_frost.pdf", "data/multimodal_test.pdf"])
hostname = "localhost"
table_name = "nemo_retriever_collection"
lancedb_uri = "./lancedb_data"
ingestor = (
create_ingestor(run_mode="service", base_url=f"http://{hostname}:7670")
.files(["data/woods_frost.pdf", "data/multimodal_test.pdf"])
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/custom-metadata.md
Line: 40-42

Comment:
**Undefined variables make the code example un-runnable**

The diff removes the `hostname`, `table_name`, and `lancedb_uri` variable definitions that previously preceded the `ingestor = (...)` block, but the `create_ingestor(...)` call still references all three. Copying this snippet results in a `NameError` on `hostname`. The variable definitions need to be restored.

```suggestion
hostname = "localhost"
table_name = "nemo_retriever_collection"
lancedb_uri = "./lancedb_data"

ingestor = (
    create_ingestor(run_mode="service", base_url=f"http://{hostname}:7670")
        .files(["data/woods_frost.pdf", "data/multimodal_test.pdf"])
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +5 to +14
## On this page { #on-this-page }

- [Attach metadata at ingestion](#attach-metadata-at-ingestion)
- [How metadata is stored](#how-metadata-is-stored)
- [Filter results at query time](#filter-results-at-query-time)
- [Writing `where` predicates](#writing-where-predicates)
- [Server-side vs client-side filters](#server-side-vs-client-side-filters)
- [Inspect hit metadata](#inspect-hit-metadata)
- [Limitations](#limitations)
- [Related content](#related-content)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 "On this page" TOC contains 6 broken anchor links

The table of contents added in this PR references #filter-results-at-query-time, #writing-where-predicates, #server-side-vs-client-side-filters, #inspect-hit-metadata, #limitations, and #related-content. None of these section headings exist in the current file body (128 lines). The body still contains the old 26.05 structure (## Best Practices, ## Use Custom Metadata to Filter Results During Retrieval, etc.) rather than the restructured sections the TOC was written for. Clicking any of these six links in the published docs will silently scroll to the top of the page.

Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/custom-metadata.md
Line: 5-14

Comment:
**"On this page" TOC contains 6 broken anchor links**

The table of contents added in this PR references `#filter-results-at-query-time`, `#writing-where-predicates`, `#server-side-vs-client-side-filters`, `#inspect-hit-metadata`, `#limitations`, and `#related-content`. None of these section headings exist in the current file body (128 lines). The body still contains the old 26.05 structure (`## Best Practices`, `## Use Custom Metadata to Filter Results During Retrieval`, etc.) rather than the restructured sections the TOC was written for. Clicking any of these six links in the published docs will silently scroll to the top of the page.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +125 to 128
## How metadata is stored { #how-metadata-is-stored }

- [Vector databases](vdbs.md) — canonical LanceDB upload and retrieval guide
- [metadata_and_filtered_search.ipynb](https://github.com/NVIDIA/NeMo-Retriever/blob/main/examples/metadata_and_filtered_search.ipynb) — CLI and graph ingest with sidecar metadata
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Section heading "How metadata is stored" contains only cross-reference bullets

The heading at line 125 was renamed from ## Related Content to ## How metadata is stored, but its body was not updated — it still contains just two reference links. Readers navigating to this section via the TOC will find no explanation of how metadata is persisted (e.g., serialized into the metadata column, how content_metadata fields are mapped). Either restore the "Related content" heading or replace the bullets with the intended storage explanation.

Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/custom-metadata.md
Line: 125-128

Comment:
**Section heading "How metadata is stored" contains only cross-reference bullets**

The heading at line 125 was renamed from `## Related Content` to `## How metadata is stored`, but its body was not updated — it still contains just two reference links. Readers navigating to this section via the TOC will find no explanation of how metadata is persisted (e.g., serialized into the `metadata` column, how `content_metadata` fields are mapped). Either restore the "Related content" heading or replace the bullets with the intended storage explanation.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant