F105: ACP server migration to coder/acp-go-sdk
Scope
In Scope
- Replace custom ACP engine (
pkg/acpserver, ~1300 lines) with github.com/coder/acp-go-sdk (acp.Agent interface, AgentSideConnection)
- Implement infrastructure adapters in
internal/infrastructure/acp/ (agent, emitter, permission, errors, projector, renderer)
- Wire
RequestPermission transport via ports.ACPClient → conn.RequestPermission
- Rewire
acp_serve.go + acp_wiring.go to use SDK connection lifecycle (<-conn.Done())
- Delete
pkg/acpserver/ package entirely
- Move
acp_errors.go from application to infra (removes last acpserver import)
- Pin SDK version (v0.13.x) in
go.mod
- Write ADR superseding ADR-018 (custom engine removal, homegrown versioning → real ACP version)
- Update
.go-arch-lint.yml (infra deps, removal of pkg/acpserver)
Out of Scope
- SDK fs/terminal client capabilities
ResumeSession / ListSessions / other ACP methods (kept as "unsupported" stubs)
- F107 facade refactor (touches same files but sequenced after F105)
- F108 Axis B call site + neutral port
PermissionGate (only transport wired here)
Deferred
| Item |
Rationale |
Follow-up |
PermissionGate neutral port + call site |
Transport-only here; gate logic belongs to F108 Axis B |
F108 |
Facade Event → SDK update reuse of toSDKUpdate |
Facade comes after ACP migration; non-blocking |
F107 |
| SDK fs/terminal client capabilities |
Not required for iso-functional parity |
future |
ResumeSession / ListSessions |
Not in current scope; stubs sufficient |
future |
User Stories
US1: SDK-backed ACP agent serves Zed at iso-functionality (P1 - Must Have)
As an AWF operator running awf acp serve,
I want the ACP server to use coder/acp-go-sdk while preserving current behavior,
So that I reduce maintenance of ~1300 lines of custom protocol code without breaking existing Zed integrations.
Why this priority: Core deliverable. Without iso-functional parity the migration cannot ship; this is the MVP that retires pkg/acpserver.
Acceptance Scenarios:
- Given Zed connected to
awf acp serve, When the user runs a pack workflow via slash-command, Then the prompt completes with end_turn stop reason and session updates stream identically to the pre-migration baseline
- Given an in-flight prompt, When Zed sends
cancel, Then the prompt is interrupted, StopReason reflects cancellation, and stdout stays serialized
- Given the SDK agent panics inside
Prompt, When the panic occurs, Then the recover wrapper translates it to an ACP error and the connection remains usable
- Given
awf acp serve starts, When the SDK initializes, Then logs are written to stderr (conn.SetLogger) and stdout carries only protocol frames
Independent Test: Connect Zed to the rebuilt binary, run initialize → session/new → prompt (end_turn) → cancel and verify protocol traces match the baseline captured from pkg/acpserver.
US2: RequestPermission transport wired through ports (P1 - Must Have)
As an AWF agent provider needing user approval,
I want the ACP infra to expose RequestPermission via ports.ACPClient,
So that F108 Axis B can plug a neutral PermissionGate without touching transport code.
Why this priority: Unblocks F108 and is the only net-new capability vs. the legacy engine; required for the migration to be net-positive.
Acceptance Scenarios:
- Given a port consumer calls
ACPClient.RequestPermission, When the call is dispatched, Then conn.RequestPermission is invoked and the SDK response is surfaced through the port
- Given concurrent
SessionUpdate traffic on stdout, When RequestPermission is issued, Then stdout remains serialized and no interleaved frames are observed
Independent Test: Unit-test the infra permission.go adapter against a fake AgentSideConnection, asserting the SDK call mapping and serialization invariants.
US3: Spike validates SDK feasibility before commitment (P1 - Must Have)
As a maintainer evaluating SDK adoption risk,
I want a throwaway spike agent driven by the SDK client (and Zed) before any production refactor,
So that unknowns at the 0.x SDK boundary are resolved before deletion of pkg/acpserver.
Why this priority: Gate. If the spike fails, the entire migration is aborted; without it F105 cannot proceed safely.
Acceptance Scenarios:
- Given the spike agent compiled against the SDK, When the scripted scenario
initialize → session/new (+ available commands) → prompt (end_turn) → prompt (resume) → cancel runs, Then all eight verification items resolve (SDK error type, AvailableCommandsUpdate shape, AgentCapabilities fields, StopReason variants, 10 MiB cap, ProtocolVersionNumber, minimum Go version, neutral signatures HandleSessionNew/Prompt/Cancel)
- Given the spike runs concurrent
SessionUpdate + RequestPermission, When load is applied, Then stream integrity holds and cancel interrupts the in-flight prompt
- Given the spike report, When any blocking item is unresolved, Then F105 is stopped and reassessed
Independent Test: Run the spike harness and the Zed-driven scenario; produce a written verification report covering all eight SDK unknowns.
US4: Architecture boundaries enforce SDK confinement (P2 - Should Have)
As an architect maintaining hexagonal boundaries,
I want the SDK import limited to internal/infrastructure/acp/ and pinned in go.mod,
So that domain and application layers remain SDK-agnostic and future SDK upgrades are contained.
Why this priority: Protects the invariants the project already enforces; non-MVP but required before deletion of pkg/acpserver to avoid leaking SDK types upward.
Acceptance Scenarios:
- Given the migration is complete, When
make lint runs go-arch-lint, Then no domain or application file imports github.com/coder/acp-go-sdk
- Given
go.mod, When inspected, Then the SDK is pinned to a v0.13.x version with // indirect only when applicable
Independent Test: Run go-arch-lint check and grep -R "coder/acp-go-sdk" internal/domain internal/application (must return zero matches outside infra).
US5: ADR documents the supersession of ADR-018 (P2 - Should Have)
As a future contributor reading project decisions,
I want an ADR explaining the SDK adoption and the removal of homegrown versioning,
So that the rationale (and the trade-off against the previous custom engine) is discoverable.
Why this priority: Required for project conventions but not user-visible at runtime.
Acceptance Scenarios:
- Given the migration is merged, When ADRs are listed, Then a new ADR-0xx supersedes ADR-018 and references the SDK pin, removed package, and ACP version source
- Given the ADR, When read, Then the trade-offs section covers SDK 0.x risk, lock-in, and the rollback plan
Independent Test: Open the ADR file and verify required sections (context, decision, consequences, supersedes ADR-018).
Edge Cases
- What happens when the SDK panics inside
Prompt? → SDK-independent recover wrapper translates to ACP error; connection stays alive
- How does the system handle concurrent
SessionUpdate + RequestPermission over stdout? → stdout serialization invariant verified at the spike and in adapter tests
- What is the behavior when
cancel arrives mid-prompt? → in-flight prompt is interrupted, StopReason reflects cancellation, parking releases
- What happens when a payload exceeds 10 MiB? → SDK cap behavior verified at the spike; matches or replaces the legacy cap
- What happens when an unsupported ACP method is invoked (
ResumeSession, ListSessions, …)? → the 7 stub methods return a typed "unsupported" error
- What happens when the SDK reports an
AvailableCommandsUpdate shape divergent from the legacy engine? → projector/renderer rewritten against the SDK constructors; baseline captured at spike
Requirements
Functional Requirements
- FR-001: System MUST replace
pkg/acpserver with github.com/coder/acp-go-sdk confined to internal/infrastructure/acp/
- FR-002: System MUST implement
acp.Agent with Initialize, NewSession, Prompt (with parking), Cancel, and 7 "unsupported" stubs
- FR-003: System MUST wire
ports.ACPClient.RequestPermission to conn.RequestPermission
- FR-004: System MUST emit session updates via
SessionUpdateEmitter → conn.SessionUpdate(toSDKUpdate(u))
- FR-005: System MUST translate domain/display events to SDK constructors via rewritten
event_projector.go and renderer.go
- FR-006: System MUST move
acp_errors.go from application/ to internal/infrastructure/acp/errors.go and expose toACPError
- FR-007: System MUST delete
pkg/acpserver/ and message.go once the SDK path is wired
- FR-008: System MUST rewrite
acp_serve.go + acp_wiring.go around acp.NewAgentSideConnection(agent, stdout, stdin) and <-conn.Done()
- FR-009: System MUST wrap SDK handlers with an SDK-independent
recover shim translating panics to ACP errors
- FR-010: System MUST pin
github.com/coder/acp-go-sdk to a v0.13.x version in go.mod
- FR-011: System MUST update
.go-arch-lint.yml to register the new infra package and remove pkg/acpserver
- FR-012: System MUST publish an ADR superseding ADR-018 covering SDK adoption and removal of homegrown versioning
- FR-013: Users MUST be able to run
awf acp serve and complete an end-to-end Zed session (initialize, session/new, prompt, cancel) without functional regression
- FR-014: System MUST pass the SPIKE gate (US3) before any deletion of
pkg/acpserver
Non-Functional Requirements
- NFR-001: Adapter test coverage MUST be > 85 % and
make test-race MUST pass with zero data races
- NFR-002: SDK logs MUST be routed to stderr via
conn.SetLogger; stdout MUST carry only protocol frames
- NFR-003:
make build, make lint, make test, make test-race MUST pass with zero violations before merge
- NFR-004: No domain or application package MUST import
github.com/coder/acp-go-sdk (enforced by go-arch-lint)
- NFR-005: Payload cap MUST remain at 10 MiB (verified / configured against the SDK default)
- NFR-006: ACP session lifecycle MUST remain single-use (one connection, one agent)
- NFR-007: Real-Zed validation MUST exercise connection, pack workflow via slash-command, parking, and cancel-interrupts-prompt
Success Criteria
- SC-001:
pkg/acpserver/ deleted and zero importers remain across the codebase
- SC-002: Zed users complete an
initialize → session/new → prompt → cancel round-trip against the rebuilt binary with no observable behavior change vs. baseline
- SC-003: Infra adapter test coverage ≥ 85 % with
test-race green
- SC-004: SDK imports limited to
internal/infrastructure/acp/ (0 matches outside that path)
- SC-005: SPIKE report resolves all eight verification items (SDK error type,
AvailableCommandsUpdate shape, AgentCapabilities fields, StopReason variants, stdio cap, ProtocolVersionNumber, minimum Go version, neutral signatures HandleSessionNew/Prompt/Cancel) before step 1 starts
- SC-006:
RequestPermission round-trip measurable through ports.ACPClient in adapter tests with stdout serialization invariant held under concurrent SessionUpdate traffic
Key Entities
| Entity |
Description |
Key Attributes |
acp.Agent (infra impl) |
Infrastructure implementation of the SDK Agent interface delegating to the neutral application service |
Initialize, NewSession, Prompt (parking), Cancel, 7 unsupported stubs, recover wrapper |
SessionUpdateEmitter |
Adapter mapping neutral session updates to SDK calls |
Emit(update) → conn.SessionUpdate(toSDKUpdate(update)) |
ports.ACPClient (infra binding) |
Port consumed by application; bound to conn.RequestPermission |
RequestPermission(ctx, params) → result |
toACPError |
Error translation helper moved from application to infra |
maps neutral errors → SDK error type |
AgentSideConnection |
SDK connection lifecycle owning stdio |
stdin/stdout streams, Done(), SetLogger |
Assumptions
github.com/coder/acp-go-sdk v0.13.x exposes acp.Agent, AgentSideConnection, SessionUpdate, RequestPermission, AvailableCommandsUpdate, AgentCapabilities, and StopReason types compatible with the legacy semantics — to be confirmed at spike
- SDK ships an exported error type usable by
toACPError — to be confirmed at spike
- SDK's stdio cap matches or can be configured to 10 MiB — to be confirmed at spike
- Real Zed instance is available for the final validation step
- F104 (MCP → go-sdk) has landed and established the SDK-adapter conventions reused here
- F103 (Codex JSONL parity) is shipped so a spike failure does not block downstream provider work
- Application-layer
acp_errors.go has no consumers outside the ACP wiring (verified before move)
Metadata
- Status: backlog
- Version: v0.11.0
- Priority: high
- Estimation: XL
Dependencies
- Blocked by: none
- Unblocks: F107, F108
Clarifications
Section populated during clarify step with resolved ambiguities.
Notes
- Source spec:
.agent/specs/2026-06-02-acp-server-coder-sdk-migration-design.md
- Research:
research-improvements.md §3
- Position in overall sequence: 3 / 6 (after F103, F104; before F106, F107, F108)
- SPIKE gate at step 0: mandatory; failure ⇒ stop and reassess. F103 + F104 already secure the provider track if F105 is aborted
- Spike scenario:
initialize → session/new (+ available commands) → prompt (end_turn) → prompt (resume) → cancel
- Spike must validate: concurrent dispatch (cancel interrupts in-flight prompt), stream integrity under concurrent
SessionUpdate + RequestPermission, SDK error type, AvailableCommandsUpdate shape, AgentCapabilities fields, StopReason variants, 10 MiB cap, ProtocolVersionNumber, minimum Go version, neutral signatures HandleSessionNew/Prompt/Cancel (the to*Params align with them)
- Security parity targets: concurrent cancel/prompt dispatch, stdout serialization under bidirectional traffic, SDK-independent
recover wrapper, logs→stderr via conn.SetLogger, 10 MiB cap, single-use lifecycle
- File-level plan (post-spike):
internal/infrastructure/acp/agent.go (NEW) — implements acp.Agent with recover wrapper (4 real methods + 7 stubs)
internal/infrastructure/acp/emitter.go — SessionUpdateEmitter
internal/infrastructure/acp/permission.go (NEW) — ports.ACPClient → conn.RequestPermission
internal/infrastructure/acp/errors.go (NEW) — toACPError, moved from application/acp_errors.go
internal/infrastructure/acp/event_projector.go — rewritten against SDK constructors
internal/infrastructure/acp/renderer.go — rewritten against SDK constructors
internal/infrastructure/acp/input_reader.go / fanout_publisher.go — adapted
internal/infrastructure/acp/message.go — deleted
acp_serve.go + acp_wiring.go — rewritten around acp.NewAgentSideConnection
pkg/acpserver/ — deleted (moving acp_errors.go removes the last acpserver import from the application)
- Cross-feature coordination:
- F107 facade touches the same
infra/acp files (renderer/projector); sequenced after F105 to reuse toSDKUpdate; non-blocking, no design conflict
- F108 Axis B wires the call site + neutral port
PermissionGate; F105 wires only the transport; disjoint files
- Definition of Done: spike passed, iso-functional +
RequestPermission wired, SDK confined to infra + pinned, ADR written, ≥85 % adapter coverage, test-race green, real Zed check completed
F105: ACP server migration to
coder/acp-go-sdkScope
In Scope
pkg/acpserver, ~1300 lines) withgithub.com/coder/acp-go-sdk(acp.Agentinterface,AgentSideConnection)internal/infrastructure/acp/(agent, emitter, permission, errors, projector, renderer)RequestPermissiontransport viaports.ACPClient→conn.RequestPermissionacp_serve.go+acp_wiring.goto use SDK connection lifecycle (<-conn.Done())pkg/acpserver/package entirelyacp_errors.gofrom application to infra (removes lastacpserverimport)go.mod.go-arch-lint.yml(infra deps, removal ofpkg/acpserver)Out of Scope
ResumeSession/ListSessions/ other ACP methods (kept as "unsupported" stubs)PermissionGate(only transport wired here)Deferred
PermissionGateneutral port + call siteEvent→ SDK update reuse oftoSDKUpdateResumeSession/ListSessionsUser Stories
US1: SDK-backed ACP agent serves Zed at iso-functionality (P1 - Must Have)
As an AWF operator running
awf acp serve,I want the ACP server to use
coder/acp-go-sdkwhile preserving current behavior,So that I reduce maintenance of ~1300 lines of custom protocol code without breaking existing Zed integrations.
Why this priority: Core deliverable. Without iso-functional parity the migration cannot ship; this is the MVP that retires
pkg/acpserver.Acceptance Scenarios:
awf acp serve, When the user runs a pack workflow via slash-command, Then the prompt completes withend_turnstop reason and session updates stream identically to the pre-migration baselinecancel, Then the prompt is interrupted,StopReasonreflects cancellation, and stdout stays serializedPrompt, When the panic occurs, Then therecoverwrapper translates it to an ACP error and the connection remains usableawf acp servestarts, When the SDK initializes, Then logs are written to stderr (conn.SetLogger) and stdout carries only protocol framesIndependent Test: Connect Zed to the rebuilt binary, run
initialize → session/new → prompt (end_turn) → canceland verify protocol traces match the baseline captured frompkg/acpserver.US2:
RequestPermissiontransport wired through ports (P1 - Must Have)As an AWF agent provider needing user approval,
I want the ACP infra to expose
RequestPermissionviaports.ACPClient,So that F108 Axis B can plug a neutral
PermissionGatewithout touching transport code.Why this priority: Unblocks F108 and is the only net-new capability vs. the legacy engine; required for the migration to be net-positive.
Acceptance Scenarios:
ACPClient.RequestPermission, When the call is dispatched, Thenconn.RequestPermissionis invoked and the SDK response is surfaced through the portSessionUpdatetraffic on stdout, WhenRequestPermissionis issued, Then stdout remains serialized and no interleaved frames are observedIndependent Test: Unit-test the infra
permission.goadapter against a fakeAgentSideConnection, asserting the SDK call mapping and serialization invariants.US3: Spike validates SDK feasibility before commitment (P1 - Must Have)
As a maintainer evaluating SDK adoption risk,
I want a throwaway spike agent driven by the SDK client (and Zed) before any production refactor,
So that unknowns at the 0.x SDK boundary are resolved before deletion of
pkg/acpserver.Why this priority: Gate. If the spike fails, the entire migration is aborted; without it F105 cannot proceed safely.
Acceptance Scenarios:
initialize → session/new (+ available commands) → prompt (end_turn) → prompt (resume) → cancelruns, Then all eight verification items resolve (SDK error type,AvailableCommandsUpdateshape,AgentCapabilitiesfields,StopReasonvariants, 10 MiB cap,ProtocolVersionNumber, minimum Go version, neutral signaturesHandleSessionNew/Prompt/Cancel)SessionUpdate+RequestPermission, When load is applied, Then stream integrity holds and cancel interrupts the in-flight promptIndependent Test: Run the spike harness and the Zed-driven scenario; produce a written verification report covering all eight SDK unknowns.
US4: Architecture boundaries enforce SDK confinement (P2 - Should Have)
As an architect maintaining hexagonal boundaries,
I want the SDK import limited to
internal/infrastructure/acp/and pinned ingo.mod,So that domain and application layers remain SDK-agnostic and future SDK upgrades are contained.
Why this priority: Protects the invariants the project already enforces; non-MVP but required before deletion of
pkg/acpserverto avoid leaking SDK types upward.Acceptance Scenarios:
make lintrunsgo-arch-lint, Then no domain or application file importsgithub.com/coder/acp-go-sdkgo.mod, When inspected, Then the SDK is pinned to a v0.13.x version with// indirectonly when applicableIndependent Test: Run
go-arch-lint checkandgrep -R "coder/acp-go-sdk" internal/domain internal/application(must return zero matches outside infra).US5: ADR documents the supersession of ADR-018 (P2 - Should Have)
As a future contributor reading project decisions,
I want an ADR explaining the SDK adoption and the removal of homegrown versioning,
So that the rationale (and the trade-off against the previous custom engine) is discoverable.
Why this priority: Required for project conventions but not user-visible at runtime.
Acceptance Scenarios:
Independent Test: Open the ADR file and verify required sections (context, decision, consequences, supersedes ADR-018).
Edge Cases
Prompt? → SDK-independentrecoverwrapper translates to ACP error; connection stays aliveSessionUpdate+RequestPermissionover stdout? → stdout serialization invariant verified at the spike and in adapter testscancelarrives mid-prompt? → in-flight prompt is interrupted,StopReasonreflects cancellation, parking releasesResumeSession,ListSessions, …)? → the 7 stub methods return a typed "unsupported" errorAvailableCommandsUpdateshape divergent from the legacy engine? → projector/renderer rewritten against the SDK constructors; baseline captured at spikeRequirements
Functional Requirements
pkg/acpserverwithgithub.com/coder/acp-go-sdkconfined tointernal/infrastructure/acp/acp.AgentwithInitialize,NewSession,Prompt(with parking),Cancel, and 7 "unsupported" stubsports.ACPClient.RequestPermissiontoconn.RequestPermissionSessionUpdateEmitter→conn.SessionUpdate(toSDKUpdate(u))event_projector.goandrenderer.goacp_errors.gofromapplication/tointernal/infrastructure/acp/errors.goand exposetoACPErrorpkg/acpserver/andmessage.goonce the SDK path is wiredacp_serve.go+acp_wiring.goaroundacp.NewAgentSideConnection(agent, stdout, stdin)and<-conn.Done()recovershim translating panics to ACP errorsgithub.com/coder/acp-go-sdkto a v0.13.x version ingo.mod.go-arch-lint.ymlto register the new infra package and removepkg/acpserverawf acp serveand complete an end-to-end Zed session (initialize, session/new, prompt, cancel) without functional regressionpkg/acpserverNon-Functional Requirements
make test-raceMUST pass with zero data racesconn.SetLogger; stdout MUST carry only protocol framesmake build,make lint,make test,make test-raceMUST pass with zero violations before mergegithub.com/coder/acp-go-sdk(enforced bygo-arch-lint)Success Criteria
pkg/acpserver/deleted and zero importers remain across the codebaseinitialize → session/new → prompt → cancelround-trip against the rebuilt binary with no observable behavior change vs. baselinetest-racegreeninternal/infrastructure/acp/(0 matches outside that path)AvailableCommandsUpdateshape,AgentCapabilitiesfields,StopReasonvariants, stdio cap,ProtocolVersionNumber, minimum Go version, neutral signaturesHandleSessionNew/Prompt/Cancel) before step 1 startsRequestPermissionround-trip measurable throughports.ACPClientin adapter tests with stdout serialization invariant held under concurrentSessionUpdatetrafficKey Entities
acp.Agent(infra impl)Agentinterface delegating to the neutral application serviceSessionUpdateEmitterEmit(update) → conn.SessionUpdate(toSDKUpdate(update))ports.ACPClient(infra binding)conn.RequestPermissionRequestPermission(ctx, params) → resulttoACPErrorAgentSideConnectionDone(),SetLoggerAssumptions
github.com/coder/acp-go-sdkv0.13.x exposesacp.Agent,AgentSideConnection,SessionUpdate,RequestPermission,AvailableCommandsUpdate,AgentCapabilities, andStopReasontypes compatible with the legacy semantics — to be confirmed at spiketoACPError— to be confirmed at spikeacp_errors.gohas no consumers outside the ACP wiring (verified before move)Metadata
Dependencies
Clarifications
Section populated during clarify step with resolved ambiguities.
Notes
.agent/specs/2026-06-02-acp-server-coder-sdk-migration-design.mdresearch-improvements.md§3initialize → session/new (+ available commands) → prompt (end_turn) → prompt (resume) → cancelSessionUpdate+RequestPermission, SDK error type,AvailableCommandsUpdateshape,AgentCapabilitiesfields,StopReasonvariants, 10 MiB cap,ProtocolVersionNumber, minimum Go version, neutral signaturesHandleSessionNew/Prompt/Cancel(theto*Paramsalign with them)recoverwrapper, logs→stderr viaconn.SetLogger, 10 MiB cap, single-use lifecycleinternal/infrastructure/acp/agent.go(NEW) — implementsacp.Agentwithrecoverwrapper (4 real methods + 7 stubs)internal/infrastructure/acp/emitter.go—SessionUpdateEmitterinternal/infrastructure/acp/permission.go(NEW) —ports.ACPClient→conn.RequestPermissioninternal/infrastructure/acp/errors.go(NEW) —toACPError, moved fromapplication/acp_errors.gointernal/infrastructure/acp/event_projector.go— rewritten against SDK constructorsinternal/infrastructure/acp/renderer.go— rewritten against SDK constructorsinternal/infrastructure/acp/input_reader.go/fanout_publisher.go— adaptedinternal/infrastructure/acp/message.go— deletedacp_serve.go+acp_wiring.go— rewritten aroundacp.NewAgentSideConnectionpkg/acpserver/— deleted (movingacp_errors.goremoves the lastacpserverimport from the application)infra/acpfiles (renderer/projector); sequenced after F105 to reusetoSDKUpdate; non-blocking, no design conflictPermissionGate; F105 wires only the transport; disjoint filesRequestPermissionwired, SDK confined to infra + pinned, ADR written, ≥85 % adapter coverage,test-racegreen, real Zed check completed