Skip to content

F105: ACP server migration to coder/acp-go-sdk #367

@pocky

Description

@pocky

F105: ACP server migration to coder/acp-go-sdk

Scope

In Scope

  • Replace custom ACP engine (pkg/acpserver, ~1300 lines) with github.com/coder/acp-go-sdk (acp.Agent interface, AgentSideConnection)
  • Implement infrastructure adapters in internal/infrastructure/acp/ (agent, emitter, permission, errors, projector, renderer)
  • Wire RequestPermission transport via ports.ACPClientconn.RequestPermission
  • Rewire acp_serve.go + acp_wiring.go to use SDK connection lifecycle (<-conn.Done())
  • Delete pkg/acpserver/ package entirely
  • Move acp_errors.go from application to infra (removes last acpserver import)
  • Pin SDK version (v0.13.x) in go.mod
  • Write ADR superseding ADR-018 (custom engine removal, homegrown versioning → real ACP version)
  • Update .go-arch-lint.yml (infra deps, removal of pkg/acpserver)

Out of Scope

  • SDK fs/terminal client capabilities
  • ResumeSession / ListSessions / other ACP methods (kept as "unsupported" stubs)
  • F107 facade refactor (touches same files but sequenced after F105)
  • F108 Axis B call site + neutral port PermissionGate (only transport wired here)

Deferred

Item Rationale Follow-up
PermissionGate neutral port + call site Transport-only here; gate logic belongs to F108 Axis B F108
Facade Event → SDK update reuse of toSDKUpdate Facade comes after ACP migration; non-blocking F107
SDK fs/terminal client capabilities Not required for iso-functional parity future
ResumeSession / ListSessions Not in current scope; stubs sufficient future

User Stories

US1: SDK-backed ACP agent serves Zed at iso-functionality (P1 - Must Have)

As an AWF operator running awf acp serve,
I want the ACP server to use coder/acp-go-sdk while preserving current behavior,
So that I reduce maintenance of ~1300 lines of custom protocol code without breaking existing Zed integrations.

Why this priority: Core deliverable. Without iso-functional parity the migration cannot ship; this is the MVP that retires pkg/acpserver.

Acceptance Scenarios:

  1. Given Zed connected to awf acp serve, When the user runs a pack workflow via slash-command, Then the prompt completes with end_turn stop reason and session updates stream identically to the pre-migration baseline
  2. Given an in-flight prompt, When Zed sends cancel, Then the prompt is interrupted, StopReason reflects cancellation, and stdout stays serialized
  3. Given the SDK agent panics inside Prompt, When the panic occurs, Then the recover wrapper translates it to an ACP error and the connection remains usable
  4. Given awf acp serve starts, When the SDK initializes, Then logs are written to stderr (conn.SetLogger) and stdout carries only protocol frames

Independent Test: Connect Zed to the rebuilt binary, run initialize → session/new → prompt (end_turn) → cancel and verify protocol traces match the baseline captured from pkg/acpserver.

US2: RequestPermission transport wired through ports (P1 - Must Have)

As an AWF agent provider needing user approval,
I want the ACP infra to expose RequestPermission via ports.ACPClient,
So that F108 Axis B can plug a neutral PermissionGate without touching transport code.

Why this priority: Unblocks F108 and is the only net-new capability vs. the legacy engine; required for the migration to be net-positive.

Acceptance Scenarios:

  1. Given a port consumer calls ACPClient.RequestPermission, When the call is dispatched, Then conn.RequestPermission is invoked and the SDK response is surfaced through the port
  2. Given concurrent SessionUpdate traffic on stdout, When RequestPermission is issued, Then stdout remains serialized and no interleaved frames are observed

Independent Test: Unit-test the infra permission.go adapter against a fake AgentSideConnection, asserting the SDK call mapping and serialization invariants.

US3: Spike validates SDK feasibility before commitment (P1 - Must Have)

As a maintainer evaluating SDK adoption risk,
I want a throwaway spike agent driven by the SDK client (and Zed) before any production refactor,
So that unknowns at the 0.x SDK boundary are resolved before deletion of pkg/acpserver.

Why this priority: Gate. If the spike fails, the entire migration is aborted; without it F105 cannot proceed safely.

Acceptance Scenarios:

  1. Given the spike agent compiled against the SDK, When the scripted scenario initialize → session/new (+ available commands) → prompt (end_turn) → prompt (resume) → cancel runs, Then all eight verification items resolve (SDK error type, AvailableCommandsUpdate shape, AgentCapabilities fields, StopReason variants, 10 MiB cap, ProtocolVersionNumber, minimum Go version, neutral signatures HandleSessionNew/Prompt/Cancel)
  2. Given the spike runs concurrent SessionUpdate + RequestPermission, When load is applied, Then stream integrity holds and cancel interrupts the in-flight prompt
  3. Given the spike report, When any blocking item is unresolved, Then F105 is stopped and reassessed

Independent Test: Run the spike harness and the Zed-driven scenario; produce a written verification report covering all eight SDK unknowns.

US4: Architecture boundaries enforce SDK confinement (P2 - Should Have)

As an architect maintaining hexagonal boundaries,
I want the SDK import limited to internal/infrastructure/acp/ and pinned in go.mod,
So that domain and application layers remain SDK-agnostic and future SDK upgrades are contained.

Why this priority: Protects the invariants the project already enforces; non-MVP but required before deletion of pkg/acpserver to avoid leaking SDK types upward.

Acceptance Scenarios:

  1. Given the migration is complete, When make lint runs go-arch-lint, Then no domain or application file imports github.com/coder/acp-go-sdk
  2. Given go.mod, When inspected, Then the SDK is pinned to a v0.13.x version with // indirect only when applicable

Independent Test: Run go-arch-lint check and grep -R "coder/acp-go-sdk" internal/domain internal/application (must return zero matches outside infra).

US5: ADR documents the supersession of ADR-018 (P2 - Should Have)

As a future contributor reading project decisions,
I want an ADR explaining the SDK adoption and the removal of homegrown versioning,
So that the rationale (and the trade-off against the previous custom engine) is discoverable.

Why this priority: Required for project conventions but not user-visible at runtime.

Acceptance Scenarios:

  1. Given the migration is merged, When ADRs are listed, Then a new ADR-0xx supersedes ADR-018 and references the SDK pin, removed package, and ACP version source
  2. Given the ADR, When read, Then the trade-offs section covers SDK 0.x risk, lock-in, and the rollback plan

Independent Test: Open the ADR file and verify required sections (context, decision, consequences, supersedes ADR-018).

Edge Cases

  • What happens when the SDK panics inside Prompt? → SDK-independent recover wrapper translates to ACP error; connection stays alive
  • How does the system handle concurrent SessionUpdate + RequestPermission over stdout? → stdout serialization invariant verified at the spike and in adapter tests
  • What is the behavior when cancel arrives mid-prompt? → in-flight prompt is interrupted, StopReason reflects cancellation, parking releases
  • What happens when a payload exceeds 10 MiB? → SDK cap behavior verified at the spike; matches or replaces the legacy cap
  • What happens when an unsupported ACP method is invoked (ResumeSession, ListSessions, …)? → the 7 stub methods return a typed "unsupported" error
  • What happens when the SDK reports an AvailableCommandsUpdate shape divergent from the legacy engine? → projector/renderer rewritten against the SDK constructors; baseline captured at spike

Requirements

Functional Requirements

  • FR-001: System MUST replace pkg/acpserver with github.com/coder/acp-go-sdk confined to internal/infrastructure/acp/
  • FR-002: System MUST implement acp.Agent with Initialize, NewSession, Prompt (with parking), Cancel, and 7 "unsupported" stubs
  • FR-003: System MUST wire ports.ACPClient.RequestPermission to conn.RequestPermission
  • FR-004: System MUST emit session updates via SessionUpdateEmitterconn.SessionUpdate(toSDKUpdate(u))
  • FR-005: System MUST translate domain/display events to SDK constructors via rewritten event_projector.go and renderer.go
  • FR-006: System MUST move acp_errors.go from application/ to internal/infrastructure/acp/errors.go and expose toACPError
  • FR-007: System MUST delete pkg/acpserver/ and message.go once the SDK path is wired
  • FR-008: System MUST rewrite acp_serve.go + acp_wiring.go around acp.NewAgentSideConnection(agent, stdout, stdin) and <-conn.Done()
  • FR-009: System MUST wrap SDK handlers with an SDK-independent recover shim translating panics to ACP errors
  • FR-010: System MUST pin github.com/coder/acp-go-sdk to a v0.13.x version in go.mod
  • FR-011: System MUST update .go-arch-lint.yml to register the new infra package and remove pkg/acpserver
  • FR-012: System MUST publish an ADR superseding ADR-018 covering SDK adoption and removal of homegrown versioning
  • FR-013: Users MUST be able to run awf acp serve and complete an end-to-end Zed session (initialize, session/new, prompt, cancel) without functional regression
  • FR-014: System MUST pass the SPIKE gate (US3) before any deletion of pkg/acpserver

Non-Functional Requirements

  • NFR-001: Adapter test coverage MUST be > 85 % and make test-race MUST pass with zero data races
  • NFR-002: SDK logs MUST be routed to stderr via conn.SetLogger; stdout MUST carry only protocol frames
  • NFR-003: make build, make lint, make test, make test-race MUST pass with zero violations before merge
  • NFR-004: No domain or application package MUST import github.com/coder/acp-go-sdk (enforced by go-arch-lint)
  • NFR-005: Payload cap MUST remain at 10 MiB (verified / configured against the SDK default)
  • NFR-006: ACP session lifecycle MUST remain single-use (one connection, one agent)
  • NFR-007: Real-Zed validation MUST exercise connection, pack workflow via slash-command, parking, and cancel-interrupts-prompt

Success Criteria

  • SC-001: pkg/acpserver/ deleted and zero importers remain across the codebase
  • SC-002: Zed users complete an initialize → session/new → prompt → cancel round-trip against the rebuilt binary with no observable behavior change vs. baseline
  • SC-003: Infra adapter test coverage ≥ 85 % with test-race green
  • SC-004: SDK imports limited to internal/infrastructure/acp/ (0 matches outside that path)
  • SC-005: SPIKE report resolves all eight verification items (SDK error type, AvailableCommandsUpdate shape, AgentCapabilities fields, StopReason variants, stdio cap, ProtocolVersionNumber, minimum Go version, neutral signatures HandleSessionNew/Prompt/Cancel) before step 1 starts
  • SC-006: RequestPermission round-trip measurable through ports.ACPClient in adapter tests with stdout serialization invariant held under concurrent SessionUpdate traffic

Key Entities

Entity Description Key Attributes
acp.Agent (infra impl) Infrastructure implementation of the SDK Agent interface delegating to the neutral application service Initialize, NewSession, Prompt (parking), Cancel, 7 unsupported stubs, recover wrapper
SessionUpdateEmitter Adapter mapping neutral session updates to SDK calls Emit(update) → conn.SessionUpdate(toSDKUpdate(update))
ports.ACPClient (infra binding) Port consumed by application; bound to conn.RequestPermission RequestPermission(ctx, params) → result
toACPError Error translation helper moved from application to infra maps neutral errors → SDK error type
AgentSideConnection SDK connection lifecycle owning stdio stdin/stdout streams, Done(), SetLogger

Assumptions

  • github.com/coder/acp-go-sdk v0.13.x exposes acp.Agent, AgentSideConnection, SessionUpdate, RequestPermission, AvailableCommandsUpdate, AgentCapabilities, and StopReason types compatible with the legacy semantics — to be confirmed at spike
  • SDK ships an exported error type usable by toACPError — to be confirmed at spike
  • SDK's stdio cap matches or can be configured to 10 MiB — to be confirmed at spike
  • Real Zed instance is available for the final validation step
  • F104 (MCP → go-sdk) has landed and established the SDK-adapter conventions reused here
  • F103 (Codex JSONL parity) is shipped so a spike failure does not block downstream provider work
  • Application-layer acp_errors.go has no consumers outside the ACP wiring (verified before move)

Metadata

  • Status: backlog
  • Version: v0.11.0
  • Priority: high
  • Estimation: XL

Dependencies

  • Blocked by: none
  • Unblocks: F107, F108

Clarifications

Section populated during clarify step with resolved ambiguities.

Notes

  • Source spec: .agent/specs/2026-06-02-acp-server-coder-sdk-migration-design.md
  • Research: research-improvements.md §3
  • Position in overall sequence: 3 / 6 (after F103, F104; before F106, F107, F108)
  • SPIKE gate at step 0: mandatory; failure ⇒ stop and reassess. F103 + F104 already secure the provider track if F105 is aborted
  • Spike scenario: initialize → session/new (+ available commands) → prompt (end_turn) → prompt (resume) → cancel
  • Spike must validate: concurrent dispatch (cancel interrupts in-flight prompt), stream integrity under concurrent SessionUpdate + RequestPermission, SDK error type, AvailableCommandsUpdate shape, AgentCapabilities fields, StopReason variants, 10 MiB cap, ProtocolVersionNumber, minimum Go version, neutral signatures HandleSessionNew/Prompt/Cancel (the to*Params align with them)
  • Security parity targets: concurrent cancel/prompt dispatch, stdout serialization under bidirectional traffic, SDK-independent recover wrapper, logs→stderr via conn.SetLogger, 10 MiB cap, single-use lifecycle
  • File-level plan (post-spike):
    • internal/infrastructure/acp/agent.go (NEW) — implements acp.Agent with recover wrapper (4 real methods + 7 stubs)
    • internal/infrastructure/acp/emitter.goSessionUpdateEmitter
    • internal/infrastructure/acp/permission.go (NEW) — ports.ACPClientconn.RequestPermission
    • internal/infrastructure/acp/errors.go (NEW) — toACPError, moved from application/acp_errors.go
    • internal/infrastructure/acp/event_projector.go — rewritten against SDK constructors
    • internal/infrastructure/acp/renderer.go — rewritten against SDK constructors
    • internal/infrastructure/acp/input_reader.go / fanout_publisher.go — adapted
    • internal/infrastructure/acp/message.godeleted
    • acp_serve.go + acp_wiring.go — rewritten around acp.NewAgentSideConnection
    • pkg/acpserver/deleted (moving acp_errors.go removes the last acpserver import from the application)
  • Cross-feature coordination:
    • F107 facade touches the same infra/acp files (renderer/projector); sequenced after F105 to reuse toSDKUpdate; non-blocking, no design conflict
    • F108 Axis B wires the call site + neutral port PermissionGate; F105 wires only the transport; disjoint files
  • Definition of Done: spike passed, iso-functional + RequestPermission wired, SDK confined to infra + pinned, ADR written, ≥85 % adapter coverage, test-race green, real Zed check completed

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureFeature specificationv0.11.0Target version

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions