(skill) update filename match fast path#2162
Conversation
…est into edwardk/skill-single-source
|
/nvskills-ci |
Greptile SummaryThis PR tightens the filename fast path in the
|
| Filename | Overview |
|---|---|
| skills/nemo-retriever/scripts/filename_fast_path.py | Tightens filename matching to require the .pdf extension, fixes the duplicate sidecar candidate path, adds per-file error handling for extraction failures, and redirects subprocess stdout to DEVNULL to fix stdout protocol pollution — all previously-reviewed issues are addressed. |
| skills/nemo-retriever/references/query.md | Updated fast-path trigger condition to document that the .pdf extension is now required, aligning documentation with the code change in filename_fast_path.py. |
| skills/nemo-retriever/SKILL.md | Removes the instruction to write output.json and updates query-turn guidance to synthesize answers in-place; cosmetic improvements to hard-limit descriptions. |
| skills/nemo-retriever/references/pitfalls.md | Trimmed failure-mode list by moving cold-start and low-relevance entries to the CLI reference docs, reducing duplication that Tier 2 had flagged. |
| skills/nemo-retriever/BENCHMARK.md | Reflects updated NVSkills-Eval results: overall verdict flipped to PASS, Tier 2 duplicate finding resolved (0 findings), finding count updated to 21. |
| skills/nemo-retriever/skill-card.md | Updated skill metadata: version bumped, output format description extended, evaluation summary updated to match new BENCHMARK.md, reference list reordered. |
| skills/nemo-retriever/skill.oms.sig | Signature bundle refreshed to cover the updated file set; no code change. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A([filename_fast_path.py called with query]) --> B{list ./pdfs/}
B -- FileNotFoundError/PermissionError --> C[print ERROR to stderr\nexit 1]
B -- success --> D[find_matches: scan basenames\nrequire name.pdf in query_lower]
D -- no matches --> E[print NO_MATCH\nexit 0]
D -- matches found --> F[extract_pages: run retriever\nstdout=DEVNULL per file]
F -- CalledProcessError --> G[WARN to stderr, skip file\ncontinue loop]
F -- success --> H[write sidecar JSON\nto /tmp/pdf_text/]
G & H --> I[rank_pages: load sidecars\nscore by token freq]
I -- json.JSONDecodeError --> J[ERROR to stderr\nreturn empty list]
I -- no sidecar found --> K[skip match]
J & K & I --> L{any scored pages?}
L -- no --> M[print NO_TEXT\nexit 0]
L -- yes --> N[print JSON ranking\n+ top page text\nexit 0]
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
skills/nemo-retriever/scripts/filename_fast_path.py:51-63
**Missing unit tests for changed matching logic**
The `find_matches` function has a material behaviour change — stems are no longer matched without the `.pdf` extension — but there are no unit tests covering the happy path, the extension-required path, or the `MIN_STEM_LEN` guard. The `test-coverage-new-code` rule requires tests for changed business logic. A query like `"Tell me about report_q3"` would have matched before and silently returns `NO_MATCH` now, with no automated safety net to catch regressions or future accidental reversions.
Reviews (6): Last reviewed commit: "lint" | Re-trigger Greptile
|
/nvskills-ci |
|
/nvskills-ci |
Signed-off-by: nvskills-svc-account <svc-nvskills-signing@nvidia.com>
Description
Checklist