Pass merge levels to remote OCR#2050
Conversation
Greptile SummaryThis PR fixes a behavioral gap where remote NIM OCR calls in several graph paths were not sending explicit
|
| Filename | Overview |
|---|---|
| nemo_retriever/src/nemo_retriever/ocr/shared.py | Adds _MERGE_LEVEL_BY_LABEL + helper to consistently derive per-label merge levels; updates remote paths in ocr_b64_to_text and ocr_page_elements to pass explicit merge_levels; refactors local-path local_jobs init to derive keys dynamically from the same mapping. |
| nemo_retriever/src/nemo_retriever/chart/shared.py | One-line fix to pass merge_levels=["word"] * len(flat_crop_b64s) in the graphic-elements remote OCR call, matching the local path. |
| nemo_retriever/tests/test_table_structure.py | New tests verify: _merge_level_for_ocr_label mapping correctness + rejection of unknown labels; local path respects monkeypatched mapping; remote path sends per-modality merge levels for mixed-label pages. |
| nemo_retriever/tests/test_chart_graphic_elements.py | Adds a test asserting the remote graphic-elements OCR call passes merge_levels=["word"]. |
| nemo_retriever/tests/test_video_frame_ocr_actor.py | Extends existing batched-call test to assert merge_levels and adds a new test verifying that a non-default merge_level configured on the actor is forwarded to the remote call. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[OCR Request] --> B{Remote or Local?}
B -- Remote --> C{Which path?}
B -- Local --> D{Which path?}
C -- "ocr_b64_to_text\n(video/full-image)" --> E["merge_levels = [merge_level] * len(valid_b64)"]
C -- "ocr_page_elements\n(page elements)" --> F["merge_levels = [_merge_level_for_ocr_label(label)\nfor each crop]"]
C -- "graphic_elements_ocr_page_elements\n(chart GE crops)" --> G["merge_levels = ['word'] * len(flat_crop_b64s)"]
E --> H[invoke_image_inference_batches]
F --> H
G --> H
D -- "ocr_page_elements\n(local jobs)" --> I["local_jobs = {ml: [] for ml in _MERGE_LEVEL_BY_LABEL.values()}"]
I --> J["_merge_level_for_ocr_label(label_name) for each crop"]
J --> K[model.invoke per merge-level batch]
subgraph Mapping ["_MERGE_LEVEL_BY_LABEL"]
M1["table → 'word'"]
M2["chart → 'paragraph'"]
M3["infographic → 'paragraph'"]
M4["text / title / header_footer → 'paragraph'"]
end
F -. uses .-> Mapping
J -. uses .-> Mapping
Reviews (5): Last reviewed commit: "Merge branch 'main' into codex/remote-oc..." | Re-trigger Greptile
18f2477 to
e8404ba
Compare
e8404ba to
6d80fd9
Compare
Description
Remote HTTP OCR now passes explicit
merge_levelsinto NIM image-inference requests so it matches the local OCR path:wordfor table crops andparagraphfor charts, infographics, and text/title/header-footer crops.word, matching its local OCR behavior.Root cause:
NIMClientalready supported per-imagemerge_levels, but several graph paths built remote OCR requests without populating that field, so the endpoint default was used instead of modality-specific local behavior.Validation:
uv run --extra dev pytest tests/test_video_frame_ocr_actor.py tests/test_table_structure.py tests/test_chart_graphic_elements.pygit diff --checkChecklist