fix: harden phase completion diagnostics#64
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughIntroduces ChangesPhase 2 Quality Validation and Integration
Budget Finish Reason Handling and Phase 3 No-Op Fast-Path
Transcript Filename Uniqueness
Sequence Diagram(s)sequenceDiagram
participant HarnessLoop as run_phase_mode
participant FinishReason as FinishReason classifier
participant GracefulCheck as check_phase_graceful_completion
participant Quality as validate_phase2_finding_quality
participant ResumePath as build_phase_resume_prompt
HarnessLoop->>HarnessLoop: reset finish_warning = None
HarnessLoop->>FinishReason: evaluate last_finish_reason
FinishReason-->>HarnessLoop: _FINISH_BUDGET
HarnessLoop->>HarnessLoop: set finish_warning (budget exhaustion)
HarnessLoop->>GracefulCheck: check_phase_graceful_completion(phase=2)
GracefulCheck->>Quality: validate_phase2_finding_quality(finding_path)
Quality-->>GracefulCheck: error list
alt quality errors present
GracefulCheck-->>HarnessLoop: (False, ["Invalid: ..."])
HarnessLoop->>HarnessLoop: set returncode = 2
HarnessLoop->>ResumePath: build_phase_resume_prompt(failure_details=errors)
ResumePath-->>HarnessLoop: resume prompt with "Fix every listed item."
else no errors
GracefulCheck-->>HarnessLoop: (True, [])
HarnessLoop->>HarnessLoop: clear finish_warning, break loop
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Coverage Report
Generated by pytest-cov on |
Greptile SummaryThis PR hardens Phase 2 completion diagnostics by splitting budget-exhaustion finish reasons (
Confidence Score: 5/5Safe to merge — all changed paths are well-tested and the core logic is sound. The finish-reason reclassification keeps _FINISH_FAILURE as an exact union alias so no existing call site changes behaviour. The new Phase 2 quality gate, no-op Phase 3 path, and gate-failure printing are each covered by targeted regression tests. No data-loss or incorrect-state paths were identified. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[run_phase_mode / _run_subphase] --> B{finish_reason}
B -->|in _FINISH_TERMINAL_OK| C{phase_ok?}
B -->|in _FINISH_MID_TURN| D[set finish_warning\ncheck completion gate]
B -->|in _FINISH_BUDGET NEW| E[set finish_warning\ncheck completion gate]
B -->|in _FINISH_HARD_FAILURE| F[returncode=2\nno retry]
B -->|step_finish, no reason| G[finish_warning: ambiguous]
C -->|yes| H[break — success]
C -->|no| I[returncode=2\nauto-resume if retries remain]
D --> J{phase_ok?}
J -->|yes| K[graceful_forgiveness\nfinish_warning=None]
J -->|no| L[returncode=2\nauto-resume if retries remain]
E --> M{phase_ok?}
M -->|yes| N[finish_warning=None\ncontinue normally]
M -->|no| O[returncode=2\nauto-resume if retries remain]
L --> P{retries left?}
O --> P
I --> P
P -->|yes| Q[build_phase_resume_prompt\ncontinue loop]
P -->|no| R[print gate failures\nreturn 2]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[run_phase_mode / _run_subphase] --> B{finish_reason}
B -->|in _FINISH_TERMINAL_OK| C{phase_ok?}
B -->|in _FINISH_MID_TURN| D[set finish_warning\ncheck completion gate]
B -->|in _FINISH_BUDGET NEW| E[set finish_warning\ncheck completion gate]
B -->|in _FINISH_HARD_FAILURE| F[returncode=2\nno retry]
B -->|step_finish, no reason| G[finish_warning: ambiguous]
C -->|yes| H[break — success]
C -->|no| I[returncode=2\nauto-resume if retries remain]
D --> J{phase_ok?}
J -->|yes| K[graceful_forgiveness\nfinish_warning=None]
J -->|no| L[returncode=2\nauto-resume if retries remain]
E --> M{phase_ok?}
M -->|yes| N[finish_warning=None\ncontinue normally]
M -->|no| O[returncode=2\nauto-resume if retries remain]
L --> P{retries left?}
O --> P
I --> P
P -->|yes| Q[build_phase_resume_prompt\ncontinue loop]
P -->|no| R[print gate failures\nreturn 2]
Reviews (4): Last reviewed commit: "fix: differentiate budget-exhaustion fro..." | Re-trigger Greptile |
There was a problem hiding this comment.
Pull request overview
This PR hardens CodeCome’s phase completion diagnostics and artifact gating by preserving full failure details, improving auto-resume prompts, enforcing stricter Phase 2 finding quality requirements, and allowing Phase 3 to succeed as a no-op when there are no PENDING findings.
Changes:
- Phase 2 completion gating now validates “explicit no findings” summaries and rejects template/stub PENDING findings via new quality checks.
- Phase harness diagnostics now print remaining gate failures on terminal-stop final failures, and resume prompts are more explicit/authoritative.
- Phase 3 is allowed to complete as a no-op when no PENDING findings exist, with a generated run summary; transcripts are uniquely timestamped.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/phases/gates.py | Makes Phase 3 gating succeed as a no-op (warns instead of failing) when there are no PENDING findings. |
| tools/phases/completion.py | Preserves full Phase 2 gate failures; adds “explicit no findings” support and validates Phase 2 finding quality. |
| tools/phases/artifact_checks.py | Implements Phase 2 artifact checks (summary required; findings must be complete unless explicitly none). |
| tools/findings/quality.py | Adds Phase 2 finding-quality validation and “no findings” summary detection helpers. |
| tools/codecome/transcript.py | Adds timestamp+pid into transcript filenames to avoid truncation/collisions. |
| tools/codecome/harness.py | Adds Phase 3 no-op completion path with generated summary; improves terminal-stop diagnostics output. |
| tests/test_phases_completion.py | Expands Phase 2 completion tests for explicit no-findings and incomplete/stub finding rejection. |
| tests/test_phase_failure_state_reset.py | Adds tests for auto-resume on missing artifacts and printing gate failures on final failure. |
| tests/test_phase_artifacts_cli.py | Adds Phase 2 artifact-check test coverage. |
| tests/test_gate_check.py | Updates Phase 3 gate behavior test to assert no-op success output. |
| tests/test_findings_quality.py | Adds unit test coverage for Phase 2 finding quality validation. |
| tests/test_event_recording.py | Updates transcript filename assertion to match new timestamp+pid naming. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Use generator counting in _pending_finding_count() - Capture datetime.now() once in _write_phase3_noop_summary() - Remove hardcoded Target path from Phase 3 noop summary - Make _contains_template_marker() case-insensitive - Avoid redundant _fresh_run_summaries call in completion gate - Add test for lowercase template guidance detection
Split _FINISH_FAILURE into _FINISH_BUDGET (length, max_tokens) and _FINISH_HARD_FAILURE (content-filter, content_filter, error). Budget-exhaustion reasons now trigger check_phase_graceful_completion and auto-resume, matching the terminal-stop + missing-artifacts path. Added regression test for length auto-resume. Also updated phase_1._run_subphase to mirror the same behaviour.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/codecome/phase_1.py`:
- Around line 590-593: The condition at lines 590-593 now handles both
_FINISH_MID_TURN and _FINISH_BUDGET, but lines 599 and 611 appear to describe
all these cases uniformly as mid-turn cutoffs. Update the code at lines 599 and
611 to differentiate the failure reason based on whether last_finish_reason is
in _FINISH_BUDGET or _FINISH_MID_TURN, so that budget exhaustion is clearly
reported as such rather than mislabeled as a mid-turn cutoff. Use a conditional
check (such as checking if last_finish_reason is in _FINISH_BUDGET) to generate
appropriate distinct messages or handling for each case.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: f91e9b37-8f1a-4178-8a73-5ce2faa765f1
📒 Files selected for processing (7)
tests/test_phase_failure_state_reset.pytools/codecome/harness.pytools/codecome/phase_1.pytools/phases/completion.pytools/rendering/events/__init__.pytools/rendering/events/base.pytools/rendering/events/step_finish.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tools/phases/completion.py
- tools/codecome/harness.py
…sages - Auto-resume [budget] now says output budget exhaustion instead of mid-turn - No-session-ID fallback messages use generic wording for both cases - Docstring updated: length moved from _FINISH_MID_TURN to _FINISH_BUDGET - step_finish.py: removed redundant _FINISH_BUDGET check (_FINISH_FAILURE already covers it)
Summary
Tests
Summary by CodeRabbit
New Features
Improvements
Tests