fix: harden phase completion diagnostics by pruiz · Pull Request #64 · pruiz/CodeCome

pruiz · 2026-06-17T23:36:01Z

Summary

preserve and print all Phase 2 completion gate failures instead of truncating diagnostics
make terminal stop artifact failures auto-resume with exact repair details and final failure output
add timestamped phase transcript names and stricter Phase 2 finding quality checks
allow Phase 3 to complete as a no-op when no PENDING findings exist

Tests

rtk pytest -q tests/test_findings_quality.py tests/test_phases_completion.py tests/test_phase_failure_state_reset.py tests/test_event_recording.py tests/test_phase_artifacts_cli.py
rtk pytest -q tests/test_render_settings_propagation.py tests/test_phase_1_mid_turn_forgiveness.py tests/test_codecome_runner.py tests/test_event_recording.py
make tests

Summary by CodeRabbit

New Features
- Phase 3 now completes successfully when no findings are pending
- Phase 2 quality validation checks findings for completeness and accuracy
Improvements
- Enhanced auto-resume behavior for incomplete phases and output budget exhaustion
- Aggregate failure reporting displays all validation errors at once
- Clearer validation guidance in phase completion messages
Tests
- Expanded coverage for phase completion gating, artifact validation, and failure recovery scenarios

coderabbitai · 2026-06-17T23:36:15Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 67116344-e59d-47df-bb62-646a4e76bd14

📥 Commits

Reviewing files that changed from the base of the PR and between bea6ace and bbd49a2.

📒 Files selected for processing (3)

tools/codecome/harness.py
tools/codecome/phase_1.py
tools/phases/completion.py

🚧 Files skipped from review as they are similar to previous changes (3)

tools/codecome/phase_1.py
tools/phases/completion.py
tools/codecome/harness.py

📝 Walkthrough

Walkthrough

Introduces tools/findings/quality.py for Phase 2 finding quality validation and wires it into artifact_checks.py and completion.py. Refactors finish-reason classification to distinguish budget exhaustion (_FINISH_BUDGET) from hard failures, extending that logic through harness and Phase 1 orchestration. Adds a Phase 3 no-pending-findings fast-path in the harness and gate. Tightens transcript filename uniqueness with timestamp/PID. Extends test coverage for all changes.

Changes

Phase 2 Quality Validation and Integration

Layer / File(s)	Summary
Phase 2 finding quality validation module `tools/findings/quality.py`	Defines required section lists, placeholder/template/non-finding marker constants, field-population predicates, `validate_phase2_finding_quality` (frontmatter + section checks), and `phase2_summary_declares_no_findings`.
Tests for Phase 2 finding quality `tests/test_findings_quality.py`	Adds `sys.path` setup, `_write_phase2_finding` helper, and two regression tests covering rejection of test/template artifacts and case-insensitive template-marker detection.
Phase 2 artifact checker and dispatcher registration `tools/phases/artifact_checks.py`	Adds `check_phase_2_artifacts()` validating run-summary presence and finding completeness/quality, registers it in `_CHECKERS` via `_check_phase_2_with_ct()`, and includes Phase 2 in `phase="all"` sweep.
Tests for Phase 2 artifact checks `tests/test_phase_artifacts_cli.py`	Verifies `check_phase_2_artifacts()` accepts explicit "None." no-findings summaries, rejects stub findings, and aggregates all quality errors without truncation.
Phase 2 freshness helpers, completion gating, and resume prompts `tools/phases/completion.py`	Adds `_fresh_run_summaries()`, `_fresh_pending_findings()`, `_latest_summary_declares_no_phase2_findings()`; rewrites Phase 2 gate to require fresh summaries and validate findings; updates `_resume_opener_for_reason()` for budget finish; changes resume prompt wording to "Fix every listed item" and appends validation-completion strictness guidance.
Tests for Phase 2 completion gating and resume prompts `tests/test_phases_completion.py`	Covers accepting structured "None." no-findings summaries, rejecting vague summaries, rejecting stub findings, aggregating quality errors, verifying updated resume-prompt wording, and sweep fixture alignment.

Budget Finish Reason Handling and Phase 3 No-Op Fast-Path

Layer / File(s)	Summary
Finish reason classification infrastructure refactoring `tools/rendering/events/base.py`, `tools/rendering/events/__init__.py`	Splits `_FINISH_FAILURE` into `_FINISH_BUDGET` (`length`, `max_tokens`) and `_FINISH_HARD_FAILURE` (content-filter, error), keeps `_FINISH_FAILURE` as a legacy union alias, and re-exports both new constants from the package.
Budget exhaustion handling in harness orchestration `tools/codecome/harness.py`	Imports `_FINISH_BUDGET`; resets `finish_warning` at loop start; generates specialized budget-exhaustion warnings; distinguishes mid-turn vs budget in graceful-completion decision; adds explicit `break` on `returncode == 0`; specializes auto-resume messaging; generalizes session-ID error wording; prints "remaining gate failures" list when non-empty.
Budget exhaustion handling in Phase 1 subphase orchestration `tools/codecome/phase_1.py`	Imports `_FINISH_BUDGET`; adds finish-warning branch for budget exhaustion; extends graceful-completion logic for mid-turn vs budget; updates auto-resume trigger for both finish reasons; generalizes session-ID resume error wording.
Phase 3 no-pending-findings fast-path and gate behavior `tools/codecome/harness.py`, `tools/phases/gates.py`	Adds datetime import, helpers to count pending Phase 3 findings and write timestamped noop summaries, and early-return fast-path when count is zero; changes `gate_phase_3` to return `0` with noop message instead of failing when no findings are pending.
Tests for Phase 3 gate, harness failure reporting, and budget auto-resume `tests/test_gate_check.py`, `tests/test_phase_failure_state_reset.py`	Verifies Phase 3 gate succeeds with "nothing to review" when no pending findings exist; tests terminal-stop auto-resume on missing artifacts, gate-failure output in stdout, and budget-exhaustion auto-resume scenarios.

Transcript Filename Uniqueness

Layer / File(s)	Summary
Transcript filename uniqueness with timestamp and PID `tools/codecome/transcript.py`, `tests/test_event_recording.py`	`Transcript.for_phase()` computes timestamp/PID and routes through `_unique_transcript_path()`; test assertion tightened to `re.fullmatch` validating the full timestamp/pid pattern.

Sequence Diagram(s)

sequenceDiagram
  participant HarnessLoop as run_phase_mode
  participant FinishReason as FinishReason classifier
  participant GracefulCheck as check_phase_graceful_completion
  participant Quality as validate_phase2_finding_quality
  participant ResumePath as build_phase_resume_prompt

  HarnessLoop->>HarnessLoop: reset finish_warning = None
  HarnessLoop->>FinishReason: evaluate last_finish_reason
  FinishReason-->>HarnessLoop: _FINISH_BUDGET
  HarnessLoop->>HarnessLoop: set finish_warning (budget exhaustion)
  HarnessLoop->>GracefulCheck: check_phase_graceful_completion(phase=2)
  GracefulCheck->>Quality: validate_phase2_finding_quality(finding_path)
  Quality-->>GracefulCheck: error list
  alt quality errors present
    GracefulCheck-->>HarnessLoop: (False, ["Invalid: ..."])
    HarnessLoop->>HarnessLoop: set returncode = 2
    HarnessLoop->>ResumePath: build_phase_resume_prompt(failure_details=errors)
    ResumePath-->>HarnessLoop: resume prompt with "Fix every listed item."
  else no errors
    GracefulCheck-->>HarnessLoop: (True, [])
    HarnessLoop->>HarnessLoop: clear finish_warning, break loop
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

pruiz/CodeCome#43: Modifies the same tools/codecome/harness.py and tools/codecome/phase_1.py retry/resume control-flow paths that this PR extends with _FINISH_BUDGET handling.
pruiz/CodeCome#46: Directly modifies tools/phases/completion.py gating, build_phase_resume_prompt(failure_details=...), and the harness/phase_1 failure-detail unpacking that this PR further updates with budget-exhaustion resume messaging and stricter validation guidance.

Poem

🐇 Hoppity-hop through the phases I go,
No stub findings allowed—quality must show!
Budget exhausted? I'll warn and resume,
Phase 3 with no pending? A no-op in bloom.
Each transcript now stamped with the time and my PID,
No file gets truncated—the rabbit forbids! 🕐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.19% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'fix: harden phase completion diagnostics' directly aligns with the main objective of improving error handling, diagnostic visibility, and completion gate enforcement across phases.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/phase2-completion-diagnostics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-17T23:37:59Z

Coverage Report

Metric	Value
Line Coverage	76.4%
Lines Covered	0 / 0

Download detailed HTML coverage reports per OS/Python from the workflow artifacts.

Generated by pytest-cov on 2026-06-19T17:14:16.884Z

greptile-apps · 2026-06-17T23:41:23Z

Greptile Summary

This PR hardens Phase 2 completion diagnostics by splitting budget-exhaustion finish reasons (length, max_tokens) from hard-failure reasons into a dedicated _FINISH_BUDGET set, wiring auto-resume for budget-exhaustion paths, adding Phase 2 finding quality validation (tools/findings/quality.py), and allowing Phase 3 to short-circuit as a no-op when no PENDING findings exist.

_FINISH_BUDGET / _FINISH_HARD_FAILURE split (base.py): _FINISH_FAILURE is retained as a union alias for backward compatibility; both harness and phase_1 now distinguish budget exhaustion (retriable) from hard failures (non-retriable).
Phase 2 quality gate (quality.py, completion.py, artifact_checks.py): New validate_phase2_finding_quality enforces populated frontmatter fields, non-placeholder section bodies, and template-marker detection; completion gate now validates fresh findings and requires explicit "no findings" wording in summaries.
Phase 3 no-op (harness.py, gates.py): Zero PENDING findings causes both the gate and the harness to exit 0 with a synthetic summary rather than treating it as an error.

Confidence Score: 5/5

Safe to merge — all changed paths are well-tested and the core logic is sound.

The finish-reason reclassification keeps _FINISH_FAILURE as an exact union alias so no existing call site changes behaviour. The new Phase 2 quality gate, no-op Phase 3 path, and gate-failure printing are each covered by targeted regression tests. No data-loss or incorrect-state paths were identified.

No files require special attention.

Important Files Changed

Filename	Overview
tools/rendering/events/base.py	Splits _FINISH_FAILURE into _FINISH_BUDGET and _FINISH_HARD_FAILURE; legacy alias kept as union — clean and backward-compatible.
tools/codecome/harness.py	Adds budget-exhaustion auto-resume path, Phase 3 no-op early exit, and gate-failure printing on final failure; logic is sound.
tools/codecome/phase_1.py	Mirrors harness.py budget/mid-turn split; removes the legacy not any_step_finish_seen condition in favour of explicit set membership checks.
tools/findings/quality.py	New module providing validate_phase2_finding_quality and phase2_summary_declares_no_findings; comprehensive template-marker and placeholder checks.
tools/phases/completion.py	Phase 2 gate now validates finding quality and requires explicit no-findings declaration; _resume_opener_for_reason docstring references _FINISH_HARD_FAILURE but code still checks _FINISH_FAILURE.
tools/phases/artifact_checks.py	Adds check_phase_2_artifacts wired into the dispatcher; static artifact check validates all PENDING findings regardless of timestamp.
tools/phases/gates.py	Phase 3 gate now returns 0 (no-op) instead of 1 when no PENDING findings exist.
tools/codecome/transcript.py	Timestamp + PID suffix added to transcript filenames to prevent collisions across concurrent runs.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[run_phase_mode / _run_subphase] --> B{finish_reason}
    B -->|in _FINISH_TERMINAL_OK| C{phase_ok?}
    B -->|in _FINISH_MID_TURN| D[set finish_warning\ncheck completion gate]
    B -->|in _FINISH_BUDGET NEW| E[set finish_warning\ncheck completion gate]
    B -->|in _FINISH_HARD_FAILURE| F[returncode=2\nno retry]
    B -->|step_finish, no reason| G[finish_warning: ambiguous]
    C -->|yes| H[break — success]
    C -->|no| I[returncode=2\nauto-resume if retries remain]
    D --> J{phase_ok?}
    J -->|yes| K[graceful_forgiveness\nfinish_warning=None]
    J -->|no| L[returncode=2\nauto-resume if retries remain]
    E --> M{phase_ok?}
    M -->|yes| N[finish_warning=None\ncontinue normally]
    M -->|no| O[returncode=2\nauto-resume if retries remain]
    L --> P{retries left?}
    O --> P
    I --> P
    P -->|yes| Q[build_phase_resume_prompt\ncontinue loop]
    P -->|no| R[print gate failures\nreturn 2]

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[run_phase_mode / _run_subphase] --> B{finish_reason}
    B -->|in _FINISH_TERMINAL_OK| C{phase_ok?}
    B -->|in _FINISH_MID_TURN| D[set finish_warning\ncheck completion gate]
    B -->|in _FINISH_BUDGET NEW| E[set finish_warning\ncheck completion gate]
    B -->|in _FINISH_HARD_FAILURE| F[returncode=2\nno retry]
    B -->|step_finish, no reason| G[finish_warning: ambiguous]
    C -->|yes| H[break — success]
    C -->|no| I[returncode=2\nauto-resume if retries remain]
    D --> J{phase_ok?}
    J -->|yes| K[graceful_forgiveness\nfinish_warning=None]
    J -->|no| L[returncode=2\nauto-resume if retries remain]
    E --> M{phase_ok?}
    M -->|yes| N[finish_warning=None\ncontinue normally]
    M -->|no| O[returncode=2\nauto-resume if retries remain]
    L --> P{retries left?}
    O --> P
    I --> P
    P -->|yes| Q[build_phase_resume_prompt\ncontinue loop]
    P -->|no| R[print gate failures\nreturn 2]

_{Reviews (4): Last reviewed commit: "fix: differentiate budget-exhaustion fro..." | Re-trigger Greptile}

Copilot

Pull request overview

This PR hardens CodeCome’s phase completion diagnostics and artifact gating by preserving full failure details, improving auto-resume prompts, enforcing stricter Phase 2 finding quality requirements, and allowing Phase 3 to succeed as a no-op when there are no PENDING findings.

Changes:

Phase 2 completion gating now validates “explicit no findings” summaries and rejects template/stub PENDING findings via new quality checks.
Phase harness diagnostics now print remaining gate failures on terminal-stop final failures, and resume prompts are more explicit/authoritative.
Phase 3 is allowed to complete as a no-op when no PENDING findings exist, with a generated run summary; transcripts are uniquely timestamped.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tools/phases/gates.py	Makes Phase 3 gating succeed as a no-op (warns instead of failing) when there are no PENDING findings.
tools/phases/completion.py	Preserves full Phase 2 gate failures; adds “explicit no findings” support and validates Phase 2 finding quality.
tools/phases/artifact_checks.py	Implements Phase 2 artifact checks (summary required; findings must be complete unless explicitly none).
tools/findings/quality.py	Adds Phase 2 finding-quality validation and “no findings” summary detection helpers.
tools/codecome/transcript.py	Adds timestamp+pid into transcript filenames to avoid truncation/collisions.
tools/codecome/harness.py	Adds Phase 3 no-op completion path with generated summary; improves terminal-stop diagnostics output.
tests/test_phases_completion.py	Expands Phase 2 completion tests for explicit no-findings and incomplete/stub finding rejection.
tests/test_phase_failure_state_reset.py	Adds tests for auto-resume on missing artifacts and printing gate failures on final failure.
tests/test_phase_artifacts_cli.py	Adds Phase 2 artifact-check test coverage.
tests/test_gate_check.py	Updates Phase 3 gate behavior test to assert no-op success output.
tests/test_findings_quality.py	Adds unit test coverage for Phase 2 finding quality validation.
tests/test_event_recording.py	Updates transcript filename assertion to match new timestamp+pid naming.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Use generator counting in _pending_finding_count() - Capture datetime.now() once in _write_phase3_noop_summary() - Remove hardcoded Target path from Phase 3 noop summary - Make _contains_template_marker() case-insensitive - Avoid redundant _fresh_run_summaries call in completion gate - Add test for lowercase template guidance detection

Split _FINISH_FAILURE into _FINISH_BUDGET (length, max_tokens) and _FINISH_HARD_FAILURE (content-filter, content_filter, error). Budget-exhaustion reasons now trigger check_phase_graceful_completion and auto-resume, matching the terminal-stop + missing-artifacts path. Added regression test for length auto-resume. Also updated phase_1._run_subphase to mirror the same behaviour.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/codecome/phase_1.py`:
- Around line 590-593: The condition at lines 590-593 now handles both
_FINISH_MID_TURN and _FINISH_BUDGET, but lines 599 and 611 appear to describe
all these cases uniformly as mid-turn cutoffs. Update the code at lines 599 and
611 to differentiate the failure reason based on whether last_finish_reason is
in _FINISH_BUDGET or _FINISH_MID_TURN, so that budget exhaustion is clearly
reported as such rather than mislabeled as a mid-turn cutoff. Use a conditional
check (such as checking if last_finish_reason is in _FINISH_BUDGET) to generate
appropriate distinct messages or handling for each case.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f91e9b37-8f1a-4178-8a73-5ce2faa765f1

📥 Commits

Reviewing files that changed from the base of the PR and between e2f35c3 and bea6ace.

📒 Files selected for processing (7)

tests/test_phase_failure_state_reset.py
tools/codecome/harness.py
tools/codecome/phase_1.py
tools/phases/completion.py
tools/rendering/events/__init__.py
tools/rendering/events/base.py
tools/rendering/events/step_finish.py

🚧 Files skipped from review as they are similar to previous changes (2)

tools/phases/completion.py
tools/codecome/harness.py

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

…sages - Auto-resume [budget] now says output budget exhaustion instead of mid-turn - No-session-ID fallback messages use generic wording for both cases - Docstring updated: length moved from _FINISH_MID_TURN to _FINISH_BUDGET - step_finish.py: removed redundant _FINISH_BUDGET check (_FINISH_FAILURE already covers it)

fix: harden phase completion diagnostics

2688f37

greptile-apps Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread tools/codecome/harness.py Outdated

Comment thread tools/codecome/harness.py Outdated

Comment thread tools/findings/quality.py Outdated

Comment thread tools/phases/completion.py Outdated

pruiz requested a review from Copilot June 18, 2026 08:03

Copilot started reviewing on behalf of pruiz June 18, 2026 08:03 View session

Copilot AI reviewed Jun 18, 2026

View reviewed changes

Comment thread tools/codecome/harness.py Outdated

Comment thread tools/codecome/harness.py Outdated

pruiz added 2 commits June 19, 2026 11:45

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread tools/codecome/phase_1.py

pruiz requested a review from Copilot June 19, 2026 15:04

Copilot started reviewing on behalf of pruiz June 19, 2026 15:04 View session

Copilot AI reviewed Jun 19, 2026

View reviewed changes

Comment thread tools/codecome/harness.py

Comment thread tools/phases/completion.py

Comment thread tools/rendering/events/step_finish.py Outdated

pruiz merged commit 3855c46 into master Jun 20, 2026
8 checks passed

Conversation

pruiz commented Jun 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Uh oh!

greptile-apps Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pruiz commented Jun 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading

github-actions Bot commented Jun 17, 2026 •

edited

Loading

greptile-apps Bot commented Jun 17, 2026 •

edited

Loading