Skip to content

Pass merge levels to remote OCR#2050

Open
charlesbluca wants to merge 4 commits into
NVIDIA:mainfrom
charlesbluca:codex/remote-ocr-merge-levels
Open

Pass merge levels to remote OCR#2050
charlesbluca wants to merge 4 commits into
NVIDIA:mainfrom
charlesbluca:codex/remote-ocr-merge-levels

Conversation

@charlesbluca
Copy link
Copy Markdown
Collaborator

@charlesbluca charlesbluca commented May 18, 2026

Description

Remote HTTP OCR now passes explicit merge_levels into NIM image-inference requests so it matches the local OCR path:

  • Full-image/video OCR repeats the actor's configured merge level for every valid image in the batch.
  • Page-element OCR sends word for table crops and paragraph for charts, infographics, and text/title/header-footer crops.
  • Graphic-elements chart OCR sends word, matching its local OCR behavior.

Root cause: NIMClient already supported per-image merge_levels, but several graph paths built remote OCR requests without populating that field, so the endpoint default was used instead of modality-specific local behavior.

Validation:

  • uv run --extra dev pytest tests/test_video_frame_ocr_actor.py tests/test_table_structure.py tests/test_chart_graphic_elements.py
  • git diff --check

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • Not applicable: no docker-compose.yaml environment variables changed.

@charlesbluca charlesbluca changed the title [codex] Pass merge levels to remote OCR Pass merge levels to remote OCR May 18, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 18, 2026

Greptile Summary

This PR fixes a behavioral gap where remote NIM OCR calls in several graph paths were not sending explicit merge_levels, causing the endpoint to apply its default instead of the modality-specific levels used by the local OCR path.

  • ocr/shared.py: Introduces _MERGE_LEVEL_BY_LABEL (a module-level mapping from label name to merge level) and a helper _merge_level_for_ocr_label() that raises ValueError on unknown labels. Both the ocr_b64_to_text (full-image/video path) and ocr_page_elements (page-element path) remote code paths are updated to populate merge_levels per-image before calling invoke_image_inference_batches.
  • chart/shared.py: The graphic_elements_ocr_page_elements remote path is updated to send \"word\" merge level for every chart crop, matching its local OCR counterpart.
  • Tests (test_video_frame_ocr_actor.py, test_table_structure.py, test_chart_graphic_elements.py): New unit tests assert that remote OCR calls carry the correct merge_levels values for each modality, and that the mapping rejects unknown labels.

Confidence Score: 5/5

Safe to merge — the changes are narrowly scoped to adding explicit merge_level forwarding on remote paths that previously relied on the endpoint default.

All three remote OCR paths now populate merge_levels in a way that is consistent with their local counterparts, and the new _merge_level_for_ocr_label helper raises an explicit ValueError for unrecognized labels rather than silently falling through. The label set passed to the helper is fully covered by _MERGE_LEVEL_BY_LABEL under the current call sites. New tests cover the happy path and the label-rejection case, and the existing test suite has been extended to assert on the new merge_levels argument.

No files require special attention.

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/ocr/shared.py Adds _MERGE_LEVEL_BY_LABEL + helper to consistently derive per-label merge levels; updates remote paths in ocr_b64_to_text and ocr_page_elements to pass explicit merge_levels; refactors local-path local_jobs init to derive keys dynamically from the same mapping.
nemo_retriever/src/nemo_retriever/chart/shared.py One-line fix to pass merge_levels=["word"] * len(flat_crop_b64s) in the graphic-elements remote OCR call, matching the local path.
nemo_retriever/tests/test_table_structure.py New tests verify: _merge_level_for_ocr_label mapping correctness + rejection of unknown labels; local path respects monkeypatched mapping; remote path sends per-modality merge levels for mixed-label pages.
nemo_retriever/tests/test_chart_graphic_elements.py Adds a test asserting the remote graphic-elements OCR call passes merge_levels=["word"].
nemo_retriever/tests/test_video_frame_ocr_actor.py Extends existing batched-call test to assert merge_levels and adds a new test verifying that a non-default merge_level configured on the actor is forwarded to the remote call.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[OCR Request] --> B{Remote or Local?}
    B -- Remote --> C{Which path?}
    B -- Local --> D{Which path?}

    C -- "ocr_b64_to_text\n(video/full-image)" --> E["merge_levels = [merge_level] * len(valid_b64)"]
    C -- "ocr_page_elements\n(page elements)" --> F["merge_levels = [_merge_level_for_ocr_label(label)\nfor each crop]"]
    C -- "graphic_elements_ocr_page_elements\n(chart GE crops)" --> G["merge_levels = ['word'] * len(flat_crop_b64s)"]

    E --> H[invoke_image_inference_batches]
    F --> H
    G --> H

    D -- "ocr_page_elements\n(local jobs)" --> I["local_jobs = {ml: [] for ml in _MERGE_LEVEL_BY_LABEL.values()}"]
    I --> J["_merge_level_for_ocr_label(label_name) for each crop"]
    J --> K[model.invoke per merge-level batch]

    subgraph Mapping ["_MERGE_LEVEL_BY_LABEL"]
        M1["table → 'word'"]
        M2["chart → 'paragraph'"]
        M3["infographic → 'paragraph'"]
        M4["text / title / header_footer → 'paragraph'"]
    end

    F -. uses .-> Mapping
    J -. uses .-> Mapping
Loading

Reviews (5): Last reviewed commit: "Merge branch 'main' into codex/remote-oc..." | Re-trigger Greptile

Comment thread nemo_retriever/src/nemo_retriever/ocr/shared.py Outdated
@charlesbluca charlesbluca force-pushed the codex/remote-ocr-merge-levels branch from 18f2477 to e8404ba Compare May 18, 2026 14:25
@charlesbluca charlesbluca force-pushed the codex/remote-ocr-merge-levels branch from e8404ba to 6d80fd9 Compare May 18, 2026 14:40
@charlesbluca charlesbluca marked this pull request as ready for review May 18, 2026 14:49
@charlesbluca charlesbluca requested review from a team as code owners May 18, 2026 14:49
@charlesbluca charlesbluca requested a review from jdye64 May 18, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant