Skip to content

[bug fix] Fail loudly on local embedding failures#2114

Open
jioffe502 wants to merge 3 commits into
mainfrom
codex/local-embed-fail-loud
Open

[bug fix] Fail loudly on local embedding failures#2114
jioffe502 wants to merge 3 commits into
mainfrom
codex/local-embed-fail-loud

Conversation

@jioffe502
Copy link
Copy Markdown
Collaborator

@jioffe502 jioffe502 commented May 26, 2026

Summary

Fixes the SDK local embedding contract so local backend exceptions cannot be returned as successful-looking ingested rows.

Problem

Before this change, local SDK embedding could fail during model/vLLM initialization or inference, but the runtime would catch the exception and still return rows like:

  • text_embeddings_1b_v2 = {"embedding": [], "error": "..."}
  • text_embeddings_1b_v2_dim = 0
  • text_embeddings_1b_v2_has_embedding = False

That made .ingest() appear to succeed even though the local embedding backend had failed. Downstream retrieval/vector-store workflows could then receive rows that looked ingested but were not queryable.

Change

For local embedding only:

  • backend exceptions are re-raised instead of converted into empty embedding rows

Remote/NIM behavior is preserved because remote paths may intentionally use row-level error payloads.

This PR intentionally does not add post-success per-row validation. If a local embed call completes and returns zero-dimension rows, those rows remain non-fatal and continue to carry *_dim = 0 / *_has_embedding = False. That avoids aborting long runs where only some records fail or have no vectors.

Expected Contract

After this change:

  • local backend failure => SDK raises
  • remote/NIM backend failure => row-level error payload behavior is unchanged
  • successful local return with zero-dim rows => SDK reports *_has_embedding = False without aborting the full batch

Notes

This PR does not claim to fix H100/vLLM/DeepGEMM setup. Pristine upstream/main validation showed the current H100 repro was caused by missing Python 3.12 development headers during vLLM/Triton JIT compilation. Once matching Python headers and sane CUDA env were provided, unmodified upstream/main produced a 2048-d embedding.

This PR fixes the SDK contract hole exposed by local backend exceptions.

Validation

  • uv run --project nemo_retriever --extra dev pytest nemo_retriever/tests/test_text_embed_runtime.py
  • uv run --project nemo_retriever --extra dev pytest nemo_retriever/tests/test_store_pipeline_stages.py nemo_retriever/tests/test_actor_operators.py::TestBatchEmbedActor nemo_retriever/tests/test_operator_flags_and_cpu_actors.py::TestBatchEmbedCPUActor
  • git diff --check HEAD~1..HEAD

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@jioffe502 jioffe502 requested review from a team as code owners May 26, 2026 20:08
@jioffe502 jioffe502 requested a review from drobison00 May 26, 2026 20:08
@jioffe502 jioffe502 changed the title Fail loudly on local embedding failures [bug fix] Fail loudly on local embedding failures May 26, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 26, 2026

Greptile Summary

This PR fixes a contract hole in the local embedding path where backend exceptions were silently converted into empty-embedding rows, making .ingest() appear to succeed while producing un-queryable output. The fix is a two-line addition that re-raises inside the existing except block when the backend is local, while leaving the remote error-payload path unchanged.

  • runtime.py: Adds _is_local_embed(endpoint, model) helper and calls raise inside the except Exception block for local backends; GPU cache flush, logging, and report_error are still executed before the re-raise, so observability is preserved.
  • test_text_embed_runtime.py: New test file covering local failure (re-raise), local zero-dim success (non-fatal), and remote failure (error-payload preservation), with an autouse fixture that isolates the error reporter between tests.

Confidence Score: 5/5

Safe to merge — the change is minimal, confined to the exception handler, and all three key behavioral contracts are covered by tests.

The diff touches only the error branch of embed_text_main_text_embed; the success path is untouched. GPU cleanup, logging, and error reporting all run before the conditional re-raise, so observability is not degraded. The _is_local_embed predicate is simple and the conditions it checks are already used upstream in _embed_group. Tests confirm the intended split between local (raise) and remote (error payload) behavior.

No files require special attention.

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/text_embed/runtime.py Adds _is_local_embed helper and uses it in the except Exception block to re-raise for local backend failures while preserving remote error-payload rows; minimal, well-scoped change with no side effects on the success path.
nemo_retriever/tests/test_text_embed_runtime.py New test file with correct SPDX header; covers local failure (re-raise), local zero-dim success (non-fatal), and remote failure (error-payload preservation). Autouse fixture drains the error reporter around each test for isolation.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[embed_text_main_text_embed] --> B{endpoint or model?}
    B -- neither --> C[raise ValueError]
    B -- provided --> D[_embed_group]
    D -- success --> E[compute dim + has_embedding]
    E --> F[return out_df]
    D -- raises Exception --> G[empty_cache + log + report_error]
    G --> H{_is_local_embed?}
    H -- Yes --> J[re-raise original exception]
    H -- No --> K[build error-payload rows]
    K --> L[return error out_df]
Loading

Reviews (3): Last reviewed commit: "Merge branch 'main' into codex/local-emb..." | Re-trigger Greptile

Comment thread nemo_retriever/src/nemo_retriever/text_embed/runtime.py Outdated
Comment thread nemo_retriever/src/nemo_retriever/text_embed/runtime.py Outdated
Comment thread nemo_retriever/src/nemo_retriever/text_embed/runtime.py Outdated
Comment thread nemo_retriever/tests/test_text_embed_runtime.py
Copy link
Copy Markdown
Collaborator

@jperez999 jperez999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure to fix those greptile comments or resolve them if they dont need to be fixed.

Comment thread nemo_retriever/src/nemo_retriever/text_embed/runtime.py Outdated
Comment thread nemo_retriever/src/nemo_retriever/text_embed/runtime.py Outdated
@jioffe502
Copy link
Copy Markdown
Collaborator Author

Review update: narrowed this PR to the proven SDK contract fix only.

Current branch behavior:

  • local embedding backend exceptions re-raise instead of returning successful-looking empty rows
  • remote/NIM row-level error payload behavior is unchanged
  • successful local calls that produce zero-dim rows remain non-fatal and are reported through *_dim = 0 / *_has_embedding = False

This removes the earlier post-success validation scope and resolves the Greptile concerns around private helper imports, validation error wording, GPU cleanup on validation failure, and modality-specific validation test coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants