Summary
Refactor CodeCome phase orchestration from procedural functions into stateful phase classes. This is motivated by recent server-resilience work where opencode serve can die mid-phase or mid-subphase. The current procedural structure makes it awkward to preserve phase progress, retry only the active subphase, and keep server lifecycle ownership cleanly separated from phase logic.
Context
During a Phase 1 run, opencode serve died while Phase 1b was active. The desired behavior is:
- Restart
opencode serve at the harness layer.
- Retry only the active failed subphase, for example
1b, not completed subphases, for example 1a.
- Keep opencode server lifecycle out of
phase_1.py.
- Preserve clear boundaries between harness infrastructure concerns, phase orchestration, subphase state, and model session retry/repair handling.
The current implementation is procedural:
harness.py owns the server for phases.
phase_1.py orchestrates 1a, 1b, CodeQL, and 1c.
_run_subphase() handles subphase attempt/retry behavior.
- Progress state is mostly local variables and return codes.
This works but becomes brittle when recovery needs to resume from the last active subphase.
Proposed Design
Introduce phase classes with explicit state and outcomes.
Example shape:
class PhaseOutcome:
status: RunStatus
phase: str
failed_subphase: str | None = None
resume_hint: str | None = None
class Phase:
id: str
def run(self) -> PhaseOutcome:
...
def resume_after_server_restart(self, outcome: PhaseOutcome) -> PhaseOutcome:
...
For Phase 1:
class Phase1(Phase):
current_subphase: Literal["1a", "1b", "codeql", "1c"]
def run(self, start_at: str = "1a") -> PhaseOutcome:
...
def run_1a(self) -> SubphaseOutcome:
...
def run_1b(self) -> SubphaseOutcome:
...
def run_codeql(self) -> PhaseOutcome:
...
def run_1c(self) -> SubphaseOutcome:
...
Desired Responsibilities
Harness
The harness should own:
opencode serve lifecycle.
- Server restart budget.
- Recovery from server death.
- Mapping
SERVER_UNREACHABLE outcomes to server restart plus phase re-entry.
Phase Classes
Phase classes should own:
- Phase-local progress.
- Subphase ordering.
- Gate checks.
- Durable artifact checks.
- Determining the correct re-entry point after recoverable failures.
Phase classes should not:
- Start
opencode serve.
- Stop
opencode serve.
- Restart
opencode serve.
Subphase Execution
Subphase execution should return structured outcomes instead of raw integer return codes.
Suggested enum:
class RunStatus(IntEnum):
OK = 0
ERROR = 1
INCOMPLETE = 2
SERVER_UNREACHABLE = 3
INTERRUPTED = 130
Acceptance Criteria
- Phase execution uses explicit
RunStatus values instead of magic numeric return codes.
phase_1.py no longer starts, stops, or restarts opencode serve.
- Harness can restart
opencode serve and re-enter Phase 1 at the failed subphase.
- Phase 1 can resume from
1a, 1b, or 1c without rerunning completed subphases.
- Phase-local state is represented explicitly rather than inferred from scattered local variables.
- Existing phase behavior remains unchanged for successful runs.
- Existing tests pass, with new tests covering server death during
1a, 1b, and 1c, and no rerun of completed subphases after restart.
Notes
This does not need to be implemented as part of the immediate server-resilience fix. The short-term fix can introduce structured outcomes and harness-level retry. This ticket tracks the cleaner long-term architecture.
Summary
Refactor CodeCome phase orchestration from procedural functions into stateful phase classes. This is motivated by recent server-resilience work where
opencode servecan die mid-phase or mid-subphase. The current procedural structure makes it awkward to preserve phase progress, retry only the active subphase, and keep server lifecycle ownership cleanly separated from phase logic.Context
During a Phase 1 run,
opencode servedied while Phase 1b was active. The desired behavior is:opencode serveat the harness layer.1b, not completed subphases, for example1a.phase_1.py.The current implementation is procedural:
harness.pyowns the server for phases.phase_1.pyorchestrates1a,1b, CodeQL, and1c._run_subphase()handles subphase attempt/retry behavior.This works but becomes brittle when recovery needs to resume from the last active subphase.
Proposed Design
Introduce phase classes with explicit state and outcomes.
Example shape:
For Phase 1:
Desired Responsibilities
Harness
The harness should own:
opencode servelifecycle.SERVER_UNREACHABLEoutcomes to server restart plus phase re-entry.Phase Classes
Phase classes should own:
Phase classes should not:
opencode serve.opencode serve.opencode serve.Subphase Execution
Subphase execution should return structured outcomes instead of raw integer return codes.
Suggested enum:
Acceptance Criteria
RunStatusvalues instead of magic numeric return codes.phase_1.pyno longer starts, stops, or restartsopencode serve.opencode serveand re-enter Phase 1 at the failed subphase.1a,1b, or1cwithout rerunning completed subphases.1a,1b, and1c, and no rerun of completed subphases after restart.Notes
This does not need to be implemented as part of the immediate server-resilience fix. The short-term fix can introduce structured outcomes and harness-level retry. This ticket tracks the cleaner long-term architecture.