Skip to content

Refactor CodeCome phase execution into stateful phase classes #56

Description

@pruiz

Summary

Refactor CodeCome phase orchestration from procedural functions into stateful phase classes. This is motivated by recent server-resilience work where opencode serve can die mid-phase or mid-subphase. The current procedural structure makes it awkward to preserve phase progress, retry only the active subphase, and keep server lifecycle ownership cleanly separated from phase logic.

Context

During a Phase 1 run, opencode serve died while Phase 1b was active. The desired behavior is:

  • Restart opencode serve at the harness layer.
  • Retry only the active failed subphase, for example 1b, not completed subphases, for example 1a.
  • Keep opencode server lifecycle out of phase_1.py.
  • Preserve clear boundaries between harness infrastructure concerns, phase orchestration, subphase state, and model session retry/repair handling.

The current implementation is procedural:

  • harness.py owns the server for phases.
  • phase_1.py orchestrates 1a, 1b, CodeQL, and 1c.
  • _run_subphase() handles subphase attempt/retry behavior.
  • Progress state is mostly local variables and return codes.

This works but becomes brittle when recovery needs to resume from the last active subphase.

Proposed Design

Introduce phase classes with explicit state and outcomes.

Example shape:

class PhaseOutcome:
    status: RunStatus
    phase: str
    failed_subphase: str | None = None
    resume_hint: str | None = None

class Phase:
    id: str

    def run(self) -> PhaseOutcome:
        ...

    def resume_after_server_restart(self, outcome: PhaseOutcome) -> PhaseOutcome:
        ...

For Phase 1:

class Phase1(Phase):
    current_subphase: Literal["1a", "1b", "codeql", "1c"]

    def run(self, start_at: str = "1a") -> PhaseOutcome:
        ...

    def run_1a(self) -> SubphaseOutcome:
        ...

    def run_1b(self) -> SubphaseOutcome:
        ...

    def run_codeql(self) -> PhaseOutcome:
        ...

    def run_1c(self) -> SubphaseOutcome:
        ...

Desired Responsibilities

Harness

The harness should own:

  • opencode serve lifecycle.
  • Server restart budget.
  • Recovery from server death.
  • Mapping SERVER_UNREACHABLE outcomes to server restart plus phase re-entry.

Phase Classes

Phase classes should own:

  • Phase-local progress.
  • Subphase ordering.
  • Gate checks.
  • Durable artifact checks.
  • Determining the correct re-entry point after recoverable failures.

Phase classes should not:

  • Start opencode serve.
  • Stop opencode serve.
  • Restart opencode serve.

Subphase Execution

Subphase execution should return structured outcomes instead of raw integer return codes.

Suggested enum:

class RunStatus(IntEnum):
    OK = 0
    ERROR = 1
    INCOMPLETE = 2
    SERVER_UNREACHABLE = 3
    INTERRUPTED = 130

Acceptance Criteria

  • Phase execution uses explicit RunStatus values instead of magic numeric return codes.
  • phase_1.py no longer starts, stops, or restarts opencode serve.
  • Harness can restart opencode serve and re-enter Phase 1 at the failed subphase.
  • Phase 1 can resume from 1a, 1b, or 1c without rerunning completed subphases.
  • Phase-local state is represented explicitly rather than inferred from scattered local variables.
  • Existing phase behavior remains unchanged for successful runs.
  • Existing tests pass, with new tests covering server death during 1a, 1b, and 1c, and no rerun of completed subphases after restart.

Notes

This does not need to be implemented as part of the immediate server-resilience fix. The short-term fix can introduce structured outcomes and harness-level retry. This ticket tracks the cleaner long-term architecture.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions