Feat/quick review v2#16
Conversation
| stripped = content.strip() | ||
|
|
||
| # Try markdown code block with json language tag | ||
| match = re.search(r"```(?:json)?\s*\n(.*?)\n```", stripped, re.DOTALL) |
There was a problem hiding this comment.
🔦🐛
Regex-based block extraction is brittle; consider supporting more variants (extra spaces, language hints, nested blocks) or a more robust extraction strategy.
| return match.group(1).strip() | ||
|
|
||
| # Try generic code block | ||
| match = re.search(r"```\s*\n(.*?)\n```", stripped, re.DOTALL) |
There was a problem hiding this comment.
🔦🐛
Second code-block extraction also relies on a strict pattern; consider consolidating extraction logic or adding tests for edge cases (e.g., multiple fences, spacing).
| return None, False | ||
|
|
||
| # Workaround: some models insert newlines before closing quotes | ||
| normalized = extracted.replace('\n"', '"') |
There was a problem hiding this comment.
🔦🐛
Normalization step (replacing \n") is a hack. Prefer robust JSON parsing with error handling and optional normalization, to avoid edge cases with escaping.
| parser = PydanticOutputParser(output_cls=ValidationAgentResponseModel) | ||
| parsed = parser.parse(normalized) | ||
| return parsed, True | ||
| except Exception: |
There was a problem hiding this comment.
🔦🐛
On failure you return (None, False) without logging; consider logging the failure and/or returning a clearer error signal.
| @@ -0,0 +1,54 @@ | |||
| """Prompt for hallucination filter — mutes comments that ask the user to investigate instead of stating verified bugs.""" | |||
|
|
|||
| HALLUCINATION_FILTER_SYSTEM_PROMPT = """ | |||
There was a problem hiding this comment.
🔦🐛
System prompt content could be split into smaller, testable templates or loaded from config for easier maintenance.
|
|
||
| For each issue whose comment is an "investigation request" (see below), call mute_issue with its issue_id and reason "investigation_request". Do NOT mute comments that state a verified bug. Prefer issuing all mute_issue calls in a single response. | ||
|
|
||
| # Mute These (investigation requests) |
There was a problem hiding this comment.
🔦🐛
Hard-coded lists of investigation-phrases; consider moving to a config/module constant for easier updates and localization.
| **Files Changed:** | ||
| {files_changed} | ||
|
|
||
| **Issues to Review (with IDs for muting):** |
There was a problem hiding this comment.
🔦🐛
Issues to Review uses placeholders; ensure formatting is robust if data is missing (e.g., missing issues_with_ids).
| self.verbose = verbose | ||
| self.logger = logging.getLogger(name=LAMPE_LOGGER_NAME) | ||
| self.llm = llm or LiteLLM( | ||
| model=MODELS.GPT_5_NANO_2025_08_07, |
There was a problem hiding this comment.
🔦🐛
Model choice (GPT_5_NANO_2025_08_07) should be verified for availability/licensing and whether a fallback is needed for environments without that model.
|
|
||
| # Skip if no findings to filter | ||
| issues_with_ids = _build_issues_with_ids(ev.agent_reviews) | ||
| if "_No issues to review._" in issues_with_ids: |
There was a problem hiding this comment.
🔦🐛
Sentinel check 'No issues to review.' relies on a specific string from a prior call; this is brittle. Consider a more explicit boolean/result from the aggregation step.
| ) | ||
|
|
||
| try: | ||
| agent_ctx = WorkflowContext(self._agent) |
There was a problem hiding this comment.
🔦🐛
Ensure the context/agent lifecycle is correct when creating and using agent_ctx; confirm that storing/fetching 'muted_reasons' is safe across runs.
| start_event=MuteIssueStart(user_prompt=user_prompt), | ||
| ctx=agent_ctx, | ||
| ) | ||
| muted_reasons = await agent_ctx.store.get("muted_reasons", default={}) |
There was a problem hiding this comment.
🔦🐛
Getting 'muted_reasons' from the store uses a default of {}; ensure the store API actually returns a dict and not None.
| if self.verbose and muted_reasons: | ||
| self.logger.debug(f"Hallucination filter muted {len(muted_reasons)} issues") | ||
|
|
||
| except Exception as e: |
There was a problem hiding this comment.
🔦🐛
Broad exception handling (except Exception) can mask real issues; prefer catching specific known exceptions from the LL/LMM/agent layer.
| def test_extract_json_from_plain_content(): | ||
| """When no markdown block, return stripped content.""" | ||
| content = ' {"no_issue": true, "findings": []} ' | ||
| assert extract_json_from_llm_content(content) == '{"no_issue": true, "findings": []}' |
There was a problem hiding this comment.
🔦🐛
Plain content extraction test looks fine.
|
packages/lampe-review/tests/unit/workflows/agentic_review/test_response_parse.py (Line 23): Markdown JSON block extraction is tested; ensure isolated JSON is returned (not the surrounding text). |
| def test_parse_validation_response_valid_json(): | ||
| """Valid JSON returns parsed model and success.""" | ||
| content = '{"no_issue": true, "findings": []}' | ||
| parsed, success = parse_validation_response(content) |
There was a problem hiding this comment.
🔦🐛
Valid JSON parse test; assumes parsed model exposes attributes (no_issue, findings). Confirm the model type aligns with tests.
| {"file_path": "src/a.py", "line_number": 42, "action": "fix", | ||
| "problem_summary": "Missing validation", "severity": "high", "category": "security"} | ||
| ]}""" | ||
| parsed, success = parse_validation_response(content) |
There was a problem hiding this comment.
🔦🐛
Test with findings checks nested fields; ensure tests align with actual model structure (dicts vs objects).
|
|
||
| def test_parse_validation_response_garbage_no_exception(): | ||
| """Arbitrary garbage returns (None, False) without raising.""" | ||
| for garbage in ["not json at all", "null", "[]", '{"x"}', "}{"]: |
There was a problem hiding this comment.
🔦🐛
Garbage inputs loop; ensures non-crashing behavior across varied inputs.
| """QuickReviewAgent._parse_response returns empty findings on malformed input (no traceback).""" | ||
| from lampe.review.workflows.quick_review.quick_review_agent import QuickReviewAgent | ||
|
|
||
| agent = QuickReviewAgent() |
There was a problem hiding this comment.
🔦🐛
Graceful fallback for QuickReviewAgent test; ensure compatibility with actual agent implementations.
🔦 description
What change is being made?
Add a new list_directory_at_commit tool, integrate a hallucination-filter step into quick PR reviews, and replace brittle JSON parsing with a robust response parser across agentic and quick-review workflows, including tests and updated prompts.
Why are these changes being made?
These changes improve PR review reliability by enabling directory-orientation, reducing noise from investigation-requests, and handling LLM outputs more gracefully. They introduce new tools, prompts, and tests, which brings added maintenance and potential edge-case risks that should be monitored.