Skip to content

feat: full §10.2 HDKD agent bootstrap — broker link-code endpoints + daemon redeem (#144)#149

Merged
hanwencheng merged 23 commits into
mainfrom
claude/impl-144-hdkd-bootstrap
May 31, 2026
Merged

feat: full §10.2 HDKD agent bootstrap — broker link-code endpoints + daemon redeem (#144)#149
hanwencheng merged 23 commits into
mainfrom
claude/impl-144-hdkd-bootstrap

Conversation

@hanwencheng
Copy link
Copy Markdown
Member

Issue #144 — Full arch.md §10.2 agent-bootstrap (HDKD omni + broker link-code endpoints)

Closes #144. Converges the PR #141 interim §10.2 (agent omni derived from the agent's own wallet; openssl rand link-code stub) to the literal ceremony.

The flow (the "install an app → approve its permissions" story)

  1. P.0 create (master) — agentkeys agent create --label agent-a --services memory → broker mints a one-time link code bound to the HDKD child omni O_agent = SHA256("agentkeys-hdkd-v1" ‖ O_master ‖ "//label").
  2. P.1 install (agent, in the sandbox) — agentkeys-daemon --init-link-code <code> generates its own K10 device key (never on the master), proves possession, redeemsJ1_agent. The broker records a pending binding.
  3. P.2 bind (master) — submits registerAgentDevice (no biometric).
  4. P.3 grant (master) — setScopeWithWebauthn (one Touch ID). P.2+P.3 are conceptually one approval; two steps for deterministic test automation.

Decisions (asked + answered)

  1. Master submits the on-chain binding — broker mints the code + J1_agent + records the pending binding; the master pulls it and binds. No Heima-mainnet contract change, no broker chain key. Async/push model (master = phone).
  2. Child omni is PUBLIC + recomputable — unforgeability = the J1_master-gated /v1/agent/create + the master-submitted binding, NOT a secret. Agent keeps a K10 device key only (omni decoupled).
  3. Daemon owns keygen + redeem (--init-link-code), sharing agentkeys-core::device_crypto with the CLI.

What landed

  • core: device_crypto (shared K10 keygen / EIP-191 / ecrecover / pop_sig + DeviceKey) + HDKD child_omni/child_omni_hex + validate_label (frozen vectors).
  • broker: POST /v1/agent/create, POST /v1/auth/link-code/redeem (pop_sig verified before consume → retryable), GET /v1/agent/pending-bindings; SQLite link-code + pending-binding store; AgentKeysClaims + mint_agent_session_jwt; mint-oidc-jwt reads actor_omni from the verified claim (STS-relay prerequisite; wallet sessions byte-identical, regression-tested).
  • daemon: --init-link-code one-shot.
  • cli: agent create + agent pending (master-side).
  • harness: Phase P rework (P.0→P.3); builds + uploads the daemon binary.
  • ci / broker-setup: §10.2 route smoke (401 = live, 404 = stale binary) + nm symbol check in setup-broker-host.sh; bumped the deploy job timeout 15→25min + SSM executionTimeout 900→1500 for the larger broker build closure (agentkeys-core pulls aws-sdk-s3/keyring/aes-gcm).
  • docs: arch.md §10.2 / §5 / §6.2 / route list; runbook Phase P + troubleshooting; docs/spec/plans/issue-144-hdkd-agent-bootstrap.md.

Deviation (vs the asked plan)

CLI agent bind/agent grant Rust subcommands are not added — chain submission lives in shell + cast, and the two existing chain helpers (heima-agent-create.sh --from-pubkey = bind, heima-scope-set.sh --webauthn = grant) already are the deterministic two-step split. The CLI ships the genuinely-new master surfaces instead (create + pending, incl. the production rendezvous). Recorded in the plan doc.

Out of scope (deferred)

Broker chain-write / meta-tx; secret-keyed HDKD; HDKD sub-actors; broker-side K11 verify (stays on-chain); production APNs/FCM push transport (the pending-binding data model + endpoint ship now).

Tests

  • cargo test -p agentkeys-core — 141 (HDKD frozen vectors, pop_sig sign→ecrecover round-trip).
  • cargo test -p agentkeys-broker-server --features auth-email-link — 179 lib + agent_bootstrap_flow (create-gated, bad-label, full create→redeem→pending, bad-pop_sig-retryable) + oidc byte-identical regression.
  • cargo clippy --workspace --all-targets -- -D warnings (default features) — clean. cargo fmt --all --check — clean. bash -n harness/phase1-wire-demo.sh + scripts/setup-broker-host.sh — OK.
  • The full --real --webauthn end-to-end needs the redeployed broker + Touch ID + a live sandbox — exercised after this deploy (CI deploy + the route smoke confirm the §10.2 code is live on the test broker).

🤖 Generated with Claude Code

…daemon redeem (#144)

Converges the PR #141 interim §10.2 to the literal ceremony: the master mints a
one-time link code bound to a hard-derived child omni
O_agent = SHA256("agentkeys-hdkd-v1" || O_master || "//label"); the agent daemon
generates its own K10 in the sandbox, redeems the code (pop_sig), and the broker
mints J1_agent carrying the HDKD omni + parent lineage. The master then approves
the binding + scope async (push → one Touch ID), iOS/Android-style.

- core: device_crypto (shared K10 keygen / EIP-191 / ecrecover / pop_sig + DeviceKey)
  + HDKD child_omni/child_omni_hex + validate_label (frozen vectors)
- broker: POST /v1/agent/create (J1_master-gated), POST /v1/auth/link-code/redeem
  (pop_sig verified before consume → retryable), GET /v1/agent/pending-bindings;
  SQLite link-code + pending-binding store; AgentKeysClaims + mint_agent_session_jwt;
  mint-oidc-jwt now reads actor_omni from the verified claim (STS-relay prerequisite;
  wallet sessions byte-identical, regression-tested)
- daemon: --init-link-code one-shot (in-sandbox keygen → redeem → persist J1_agent
  → emit binding artifact)
- cli: agentkeys agent create + agent pending (master-side)
- harness: Phase P rework — P.0 create (real broker code) → P.1 install (daemon
  --init-link-code) → P.2 bind → P.3 grant; build + upload the daemon binary
- ci / broker-setup: §10.2 route smoke (401 = live, 404 = stale binary) + nm symbol
  check in setup-broker-host.sh; bump the deploy job timeout 15→25min + SSM
  executionTimeout 900→1500 for the larger broker build closure (agentkeys-core)
- docs: arch.md §10.2 (async master-submits ceremony), §5 agent_omni row, §6.2,
  route list; operator-runbook-wire.md Phase P + troubleshooting; new
  docs/spec/plans/issue-144-hdkd-agent-bootstrap.md

Decisions (master submits the on-chain binding — no contract change; child omni is
public + recomputable; daemon owns keygen+redeem with shared core) and deviations
are recorded in docs/spec/plans/issue-144-hdkd-agent-bootstrap.md.

Tests: core 141; broker 179 lib + agent_bootstrap_flow integration + oidc regression;
clippy -D warnings clean (default features); cargo fmt clean; harness bash -n OK.
…rade prereq + P.0 in the install step

The Phase P rewrite landed in the prior commit; this closes the two gaps an
operator would hit: (1) a Real-mode prereq that the broker must be running the
issue-144 code (else Phase P P.0 agent create 404s — folds in the
setup-broker-host.sh --upgrade fix + the 401-not-404 deploy self-check), and
(2) P.0 (master mints the link code) in the install-step summary bullet.
…us (idempotent)

- demo (harness Phase P): P.0 now drives the agentkeys agent create CLI (was raw
  curl); P.1b drives agentkeys agent pending (the master-pull rendezvous); P.2
  acks the broker after binding so pending self-cleans. Falls back to a raw POST
  only when no local host binary exists.
- broker: new POST /v1/agent/pending-bindings/ack (J1_master-gated, operator-scoped
  mark_bound). The rendezvous never cleared before (bound_at was never set), so
  pending listed redeemed agents forever; the ack fixes that AND makes the list
  idempotent. agent_bootstrap_flow now asserts pending=1 then ack then pending=0.
- idempotency: AGENT_LABEL stays stable (deterministic HDKD omni) and the K10 file
  persists in the long-lived sandbox (clean_slate never wipes ~/.agentkeys), so
  registerAgentDevice hits already-registered, scope re-set + seed overwrite are
  no-ops, and the ack keeps pending clean — re-runs converge.
- docs: arch.md route list + 10.2 ack step; runbook Phase P (P.1b + ack + stable-label note).

broker 179 lib + agent_bootstrap_flow (incl. ack) green; clippy -D warnings clean.
… references

Per the CLAUDE.md never-pass---upgrade rule (it is a back-compat no-op; the
script is idempotent):
- operator-runbook-wire.md broker-version prereq: --ref main instead of --upgrade.
- setup-broker-host.sh: idempotency comment no longer illustrates with --upgrade.
…test brokers)

setup-cloud.sh step 15 (SSM SendCommand to bring up the MCP server) assumed the
broker EC2 was already a registered SSM managed instance but never ensured it.
Operators hit `SendCommand -> InvalidInstanceId` because the broker-host role was
created WITHOUT AmazonSSMManagedInstanceCore, so the on-host amazon-ssm-agent
can't register. (And separately, a caller lacking ssm:SendCommand got a
misleading "does the instance have the agent?" message.)

- ensure_ssm_managed(): runs before SendCommand. Resolves the role from the
  INSTANCE's attached profile (naming-agnostic, so the SAME code fixes BOTH the
  prod `agentkeys-broker-host` and the test broker's own profile), idempotently
  attaches AmazonSSMManagedInstanceCore if missing, then polls
  describe-instance-information until PingStatus=Online. If the agent never
  registers (role now correct => the agent itself isn't running), it dies with
  the exact restart remediation (ssh-broker.sh + setup-broker-host.sh --upgrade,
  or reboot). Idempotent: a re-run with the policy already attached skips.
- SendCommand now captures stderr and distinguishes a CALLER ssm:SendCommand
  AccessDenied (identity-based policy gap) from a real instance problem, with a
  precise remediation (put-user-policy; see provision-ci-deploy-role.sh for the
  policy shape) instead of the misleading instance-agent message.
- aws iam calls are global (no --region); ec2/ssm reads pass --region "$REGION"
  per the agentkeys-admin-defaults-to-us-west-2 trap (CLAUDE.md).

Env-agnostic + idempotent: works for both broker envs and converges on re-run.
… refs + add CLAUDE.md rule

setup-broker-host.sh treats --upgrade (and --skip-pull) as back-compat NO-OPS
(it is idempotent + auto-detects bootstrap vs upgrade), so emitting it is
misleading. Replace active-path references with --ref main (the canonical
idempotent deploy invocation per CLAUDE.md) and codify the rule:
- setup-cloud.sh: ensure_ssm_managed remediation suggests --ref main.
- docs/ci-setup.md: prod-broker manual deploy uses --ref main.
- CLAUDE.md (Remote broker host): explicit never-pass---upgrade rule.
…-u unbound-var)

SSM RunShellScript executes the step-15 mcp-bring-up script as root with a
MINIMAL env (no HOME). Under set -euo pipefail the first $HOME use
(export PATH=$HOME/.cargo/bin) aborted with 'HOME: unbound variable' — a latent
bug that only surfaced once SSM delivery started working (previously it failed at
send-command). Default HOME to /root before any $HOME use. Also document
--only-step 15 in the script's re-run examples.
…etup script

Broaden the rule from setup-broker-host.sh to all idempotent setup scripts
(setup-cloud.sh, setup-heima.sh, heima-* helpers) and restore the actionable
guidance (invoke plain / --only-step N, or --ref main for a broker redeploy;
replace existing active-path --upgrade references).
…ld is not a hang)

setup-mcp-host.sh runs 'cargo install --git' to build agentkeys-mcp-server FROM
SOURCE; a cold build on the t3.medium broker takes 10-20 min, but step 15 polled
SILENTLY with a 10-min cap — so it looked hung and timed out mid-build. Add a
~30s heartbeat (status + elapsed) and raise the cap to 25 min. On timeout, state
the build is likely still running on the broker (not a failure) and point to
--from-step 15 to resume (cargo cache => fast) plus the get-command-invocation
watch command.
…argo build, not cargo install --git

Step 15 took ~10 min because setup-mcp-host.sh used 'cargo install --git --force':
it re-clones the repo and builds agentkeys-mcp-server + its WHOLE dep tree
(aws-sdk-s3, tokio, k256, reqwest, ...) in a THROWAWAY target every run — a cold
release build on the 2-vCPU t3.medium broker. setup-broker-host.sh already
compiled those same deps into $REPO_ROOT/target/release (persistent, incremental),
and the harness builds the MCP server with persistent cargo caches too; the
cargo-install path was the lone build throwing the cache away.

Build it the same way the broker/workers (and harness) do: cd $REPO_ROOT
(/opt/agentkeys-src, already checked out by step 15's clone/reset) and
'cargo build --release --locked -p agentkeys-mcp-server', install from
target/release. Reuses the shared dep cache => incremental (seconds-2 min), not a
10-20 min cold build. Worst case (no prior build) is no slower than before, and
all re-runs are fast since target/ persists.
…up, drop operator flags

- harness/phase1-wire-demo.sh: cross-build target -> named docker volume (off the macOS bind-mount); copy only the 3 binaries out to the host path the upload step reads.

- setup-cloud.sh: step 15 (hosted MCP on broker) is now a no-op redirect. It is a broker-host concern + the deferred Hosted-LLM path (#152), not cloud/IAM.

- setup-broker-host.sh / setup-mcp-host.sh: state-driven idempotency over operator flags. Drop --without-workers / --without-build / --clean / --no-clean; unattended by default (--yes/--non-interactive kept as accepted no-ops for CI). Hosted MCP auto-converges when its binary is already installed.

- docs/operator-runbook-wire.md: wrap-up section -- three setup entry points (run only what changed), no-flag idempotent posture, right test order; build-cache + troubleshooting updates.

- docs/spec/plans/issue-107-mcp-demo-runbook.md: correct the B.5 deploy path (setup-mcp-host.sh, no --with-mcp).

Refs #149, #152.
…-broker-host.sh step

operator-runbook-wire.md now tells a from-nothing operator exactly what to run and ON WHICH MACHINE: (1) setup-cloud.sh on the laptop, (2) ssh-broker.sh + sudo setup-broker-host.sh --ref <branch> ON the broker host, (3) setup-heima.sh, (4) the harness. Explains the --ref branch choice (claude/impl-144-hdkd-bootstrap until #149 merges) and why a 'git pull' on main shows nothing (the work is on the feature branch). Keeps the re-run-only-what-changed table.

Refs #149.
…registerAgentDevice

Before: AGENT_LABEL is stable + the sandbox K10 persisted, so P.2 hit the registerAgentDevice already-registered skip and the pairing path was never exercised ('ok ... or already-active'). The contract refuses to re-register a hash (registeredAt stays set even after revoke), so a real re-pair needs a NEW key.

Now, in fresh_pairing mode (--real without --reuse-agent): P.depair revokes the prior device (recorded as a PUBLIC-hash sidecar in the sandbox — key never leaves; heima-device-revoke.sh self-checks isActive + is idempotent, agent-tier so no Touch ID), wipes the sandbox K10 so P.1 mints a fresh key, and P.2 does a REAL registerAgentDevice. P.1 re-stashes the new hash for the next run's depair.

First run after this lands wipes the pre-existing ('hardcoded') key and registers fresh; that prior device stays an orphan (revoke it once by hand if you want it gone). Runbook tx-count notes updated (now ~2 txs/run: revoke + register).

Refs #149.
…-device vs stable-omni

The depair/re-pair was only in the TL;DR. Add P.depair to the 'What happens, in order' walkthrough, the Install-(pair) flag note, and the Phase-P troubleshooting row (which wrongly said 'the agent reuses its K10' — it now mints a fresh K10 each run).

Also reconcile the contradiction the depair surfaced: the label-derived omni is STABLE (same agent), only the device key is fresh — so memory persists and step 1.5 overwrites each run. Fixed the 'fresh/new agent identity ⇒ empty memory' wording in 4 spots.

Refs #149.
…f a bare 'no link code'

P.0 hitting the §10.2 routes on a stale broker returned a confusing 'agent/create returned no link code: HTTP 404'. Detect the 404 and name the exact fix (setup-broker-host.sh --ref claude/impl-144-hdkd-bootstrap on the broker) + the 401-not-404 verify. Notes that 1.4/1.5 are cascades. Refs #149.
… + root-owned fix

Operator hit 'unable to unlink … Permission denied' on a manual git pull because a prior 'sudo bash setup-broker-host.sh' ran git/cargo as root → root-owned repo. Note: use --ref (its git runs consistently), chown -R agentkey to restore manual git, and that the broker uses git not jj. Refs #149.
…git-pull Permission denied)

Running 'sudo bash setup-broker-host.sh' executes the script's plain git (--ref) + cargo as root, leaving root-owned files in the checkout that block the operator's next 'git pull' (error: unable to unlink … Permission denied). New §8c: when SUDO_USER is set, chown -R the repo back to the invoking user at the end — idempotent, scoped to $REPO_ROOT only (system files stay root/agentkeys-owned), no-op when run as the user directly or as root with no SUDO_USER (SSM/root-managed /opt clone). Runbook note updated. Refs #149.
…h registration (Codex finding #3)

P.depair swallowed revoke + K10-wipe failures with '|| true' then unconditionally reported 'clean slate', and P.2 grepped only '"ok":true' — which heima-agent-create.sh also returns for the already-registered SKIP (no tx_hash). So a stale-sidecar / unreachable-sandbox / failed-revoke run could greenlight 'a real registration' while only exercising the already-active path.

Now: (1) P.depair hard-fails if revoke is not ok:true, and confirms the sandbox K10 is actually gone (test -e) before claiming a clean slate; (2) P.1 hard-fails if the new device_key_hash == the prior sidecar hash (wipe didn't take); (3) P.2 distinguishes skipped:already-registered (FAIL — not a real registration) from a real tx_hash (ok) before acking. Refs #149.
…device-active (Codex finding #1)

mint_oidc_jwt accepted ANY valid session and signed an AssumeRoleWithWebIdentity JWT with no binding/scope check. Since /v1/auth/link-code/redeem mints J1_agent pre-binding, a redeemed-but-unapproved agent could get STS creds to its own actor prefix before the master's registerAgentDevice.

Now: an agent_hdkd session (device_pubkey present) must pass the SAME on-chain SidecarRegistry.getDevice check the cap-mint path uses — registered_at!=0, !revoked, and device.actor_omni == session omni — or it gets 403 (audited as AuthFailed). Wallet/master sessions (no device_pubkey) are unaffected. cap.rs ChainContracts/DeviceEntry/call_get_device exposed pub(crate) for reuse. cargo test -p agentkeys-broker-server green. Refs #149.
…ly file (Codex finding #2)

The daemon's --init-link-code printed session_jwt in its stdout artifact; the harness captured it on the master and passed it to the sandbox MCP as --agent-session-bearer (a CLI arg), exposing the bearer in the master shell + the sandbox process list (ps), readable by co-resident untrusted code.

Now: the daemon writes J1_agent to an owner-only ~/.agentkeys/agent-session.jwt (0600) and emits the PUBLIC session_file path instead of the JWT. The MCP server reads the bearer from --agent-session-bearer-file (env MCP_AGENT_SESSION_BEARER_FILE); a direct --agent-session-bearer still wins when set. The harness (fresh-pairing) passes the file path, never the value; --reuse-agent keeps the value path. Bearer now travels daemon->file->MCP entirely in the sandbox. K10 private-key custody was already intact; this closes the derived-bearer leak. cargo build (broker+daemon+mcp) green; bash -n clean. Refs #149.
…CTOR_OMNI for cap-mint

Two issues surfaced on a full §10.2 fresh-pairing run:

P.2 false-FAIL (my finding-3 regression): the check ran 'jq -e' on $reg = heima-agent-create.sh 2>&1 (stderr logs + the JSON line). jq can't parse the mixed text → silently fails → P.2 reported FAIL even though registerAgentDevice SUCCEEDED (real tx 0x90f7…, block 9690030). Now extract the JSON line (grep -oE '{.*}' | tail -1) before jq.

cap-mint 400 'actor_omni must start with 0x' (1.5/3.1/4.2): the §10.2 child omni is un-prefixed by design (child_omni_hex), but cap-mint's validate_hex32 requires 0x (like OPERATOR_OMNI, which is already 0x). ACTOR_OMNI was set to the bare child omni → cap_mint rejected it. Now ACTOR_OMNI="0x${ds_actor#0x}" (exactly one 0x); ds_actor stays un-prefixed for the chain helpers + the child_omni== check.

Follow-up (not demo-blocking): a production agent whose session omni is un-prefixed hits the same cap-mint 400; the durable fix is cap-mint accepting un-prefixed omnis (validate_hex32 -> normalize_hex32) or the MCP normalizing the request. Tracked separately. Refs #149.
…hardened key writes, reuse bearer custody

A [high] oidc agent gate (oidc.rs) only checked active+actor; now mirrors the FULL cap-mint invariant — device.operator_omni == session parent_omni AND actor_omni == omni_account AND roles & ROLE_CAP_MINT AND active. Without the operator+role checks, any other registered operator could bind (this device hash, this actor) and the agent would pass, bypassing the master that issued the link code. Exposed cap.rs DeviceEntry.operator_omni/roles + ROLE_CAP_MINT as pub(crate).

B [high] write_key_0600 (agentkeys-core/device_crypto.rs): mode(0o600) only applies on CREATE, so a pre-existing loose-perms file kept its mode, and open() followed symlinks. Now rejects a pre-existing symlink/non-regular target and force-chmods 0600 after open. Residual TOCTOU (needs O_NOFOLLOW/libc) noted as follow-up.

C [medium] harness --reuse-agent passed --agent-session-bearer <value> (in the MCP argv/ps); now stages the master-minted bearer into the same owner-only sandbox file (umask 077) and passes only --agent-session-bearer-file, matching fresh-pairing. Neither mode leaves the JWT in ps.

cargo test -p agentkeys-core green (141+3); broker+daemon+mcp build clean; bash -n harness clean. Refs #149.
The agent-gate edit in d7a4c01 wasn't rustfmt-clean → CI 'cargo fmt --all -- --check' failed at 23s (parent_omni method chain + dkh map_err block). Ran cargo fmt --all (only oidc.rs affected). Verified locally with the EXACT CI commands: fmt --all --check clean, 'clippy --workspace --all-targets -- -D warnings' exit 0, 'test --workspace --test-threads=1' exit 0. Refs #149.
@hanwencheng hanwencheng merged commit dc94fba into main May 31, 2026
7 checks passed
hanwencheng added a commit that referenced this pull request May 31, 2026
Ports the Claude Design "agentkeyweb" handoff into apps/parent-control as the
primary, demoable operator experience. Maps 1:1 to the 9 user workflows.
Plan + verification + pushback: docs/plan/web-flow/issue-9step-flow.md.

Flow (workflow → component):
 1. WebAuthn login + onboarding ceremony  → ceremony.tsx (OnboardingScreen + CeremonyRunner)
 2. Memory panel: plant preserved memory  → memory.tsx (empty-state + plant ceremony,
    auto-detect existing, dedup guard — plant hidden + blocked once planted)
 3-4. Agent connects + master notified    → App bell + pairing request (post-#149: a pending binding)
 5. Request detail (agent + permissions)  → pairing.tsx request card
 6-7. Accept + Touch ID + ceremony        → WebAuthnModal → CeremonyRunner (PAIRING_STEPS)
 8. Device view + permission view         → pairing.tsx (device-grid) + permissions.tsx
    (mobile-style scoped PermissionList — replaces tables, the "won't scale" ask)
 9. Audit + decodable Heima TXs           → dashboard.tsx AuditFeed → EventDecodeModal
    (decodeCalldata mock; real decode tracked in #153)

New files:
 - lib/demoData.ts            seed actors/events + ONBOARDING_STEPS, PAIRING_STEPS,
                              PRESERVED_MEMORY, INCOMING_PAIRING, CHAIN_PROFILE,
                              VAULT_ITEMS, txHash, decodeCalldata (mock), ONCHAIN_KINDS
 - _components/ceremony.tsx   CeremonyRunner (progress bar + live step log + tx hashes) + OnboardingScreen
 - _components/memory.tsx     MemoryPage (plant / dedup / per-namespace listing)
 - _components/pairing.tsx    PairingPage (request → accept → device/permission view toggle)
 - _components/permissions.tsx PermSeg/PermSwitch/PermissionList/PermissionView (mobile scoped)
 - _components/dashboard.tsx  ActorsList, ActorDetail (editable PermissionList), AuditFeed

Changed:
 - _components/App.tsx        rewritten as the self-contained flow orchestrator:
                              onboarding gate (localStorage), header bell + badge,
                              memory/pairing/audit/chain routes, WebAuthn + pairing-ceremony
                              + tx-decode + memory-view modals
 - _components/types.ts       + CeremonyStep/PreservedMemory/PairingRequest/ChainProfile/
                              ContractInfo/RequestedPerm; Actor.justPaired; ChipKind +scope/device/k11;
                              Route +memory/pairing/chain
 - lib/constants.ts           CHIP_STYLES covers the new chip kinds
 - app/globals.css            ceremony/onboard/empty-memory/pair-req/view-toggle/device-grid/
                              bell/tx-decode/mem-body/perm-* blocks (no rounded corners, hairline rules)

Scope note: M1 visible flow is seed-data + local ceremony state (the prototype's model).
Real-daemon wiring stays behind the lib/client seam and is Phase 2 — #149 endpoints
(agent create / pending-bindings / bind / grant), onboarding + master-memory endpoints,
and the #153 audit decoder. The old client-based pages (pages.tsx/workers.tsx/onboarding.tsx)
remain on disk for that wiring; App no longer imports them.

Verified: npx tsc --noEmit clean · npm run build ok (4 static pages, 20.1 kB route) ·
dev smoke serves the onboarding screen (HTTP 200, all markers present).
hanwencheng added a commit that referenced this pull request Jun 1, 2026
… lowercase) (#154)

The image job tagged ghcr.io/${{ github.repository }}/agentkeys-mcp-server, but github.repository keeps the repo's real casing (litentry/agentKeys — capital K), so docker buildx rejected it: 'invalid tag ... repository name must be lowercase'. The job runs only on push to main (skips on PRs), so it first failed on the #149 merge (Actions run 26719167517).

Fix: a 'Resolve lowercase image name' step lowercases $GITHUB_REPOSITORY via tr (portable to bash 3.2 per CLAUDE.md, not the bash-4 ${,,}) → ghcr.io/litentry/agentkeys/agentkeys-mcp-server, fed to both :latest and :$sha tags. YAML validated; tr output verified locally.
hanwencheng added a commit that referenced this pull request Jun 1, 2026
…r-issued link codes (#159)

* plan: method-A agent-initiated pairing design (replaces #149 front-half)

Flip §10.2 from master-mints-link-code → agent-submits-request + master-claims-by-code (the IoT scan-the-device-QR model). Reuses #149's on-chain bind+scope tail. Unbind/factory-reset deferred → #156 (client) + #155 (on-chain self-revoke).

* agentkeys: §10.2 method-A pairing — broker request/claim/poll endpoints

Flip the agent bootstrap from master-initiated (link code) to agent-
initiated (the agent shows a code, the master claims it — the Matter/
HomeKit IoT model). Replaces #149's master-mint front-half; reuses the
on-chain bind + scope tail unchanged.

Broker:
- NEW storage/pairing_requests.rs — unbound, agent-created request pool
  (issue/claim/poll/pending_bindings/mark_bound/purge). J1 is NOT stored
  at rest; minted fresh at poll time on a re-proved pop_sig.
- NEW handlers/agent/request.rs (agent, pop_sig-gated) — open an unbound
  request, return {request_id (secret), pairing_code (display)}.
- NEW handlers/agent/claim.rs (master, J1-gated) — claim by code, derive
  O_agent=HDKD(O_master,//label), record pending binding.
- NEW handlers/agent/poll.rs (agent, pop_sig-gated) — once claimed, mint
  + return J1_agent.
- REMOVE handlers/agent/{create,redeem}.rs + storage/link_codes.rs.
- Rename link_code_store -> pairing_request_store across state/boot/main.
- Rewire routes: /v1/agent/pairing/{request,claim,poll}; keep
  pending-bindings + /ack (now keyed by request_id).

Tests: 14 store unit tests + agent_bootstrap_flow rewritten for the
request->claim->poll flow (5 cases incl. cross-device/bad-pop_sig poll
rejection). clippy --all-features --all-targets -D warnings clean.

Unbind/factory-reset re-pair deferred -> #156; on-chain self-revoke -> #155.

* agentkeys: §10.2 method-A pairing — daemon + CLI + harness + broker-host smoke

Flip the client + wire harness to agent-initiated pairing (issue #144,
method A), matching the broker request/claim/poll endpoints.

Daemon (--init-link-code → two one-shots mirroring the two endpoints):
- --request-pairing: in-sandbox K10 keygen → POST /v1/agent/pairing/request
  → print {request_id, pairing_code, …}; persist a 0600 state file so
  --retrieve-pairing can resolve request_id (--request-id overrides).
- --retrieve-pairing: poll /v1/agent/pairing/poll until claimed (bounded by
  --init-poll-timeout-seconds), mint+persist J1_agent (0600), emit artifact.

CLI: agent create → agent claim --pairing-code <code> --label … --services …
(POST /v1/agent/pairing/claim). agent pending unchanged (rows now keyed by
request_id).

Harness phase1-wire-demo.sh Phase P inverts: P.0 agent --request-pairing
(shows code) → P.1 master agent claim → P.1b agent --retrieve-pairing (J1) →
P.1c pending → P.2 bind + ack-by-request_id → P.3 grant. P.depair unchanged.
404 trap + route names updated.

setup-broker-host.sh (runbook-fix-fold-back): nm symbol grep →
pairing_{request,claim,poll}|pending_bindings; route smoke → no-bearer POST
/v1/agent/pairing/claim must be 401 not 404.

cli+daemon: clippy --all-targets -D warnings clean, fmt clean, tests pass
(38/38 single-threaded; the 1 parallel-suite failure is a pre-existing k11
enroll test race on a shared HOME path, unrelated). bash -n both scripts OK.

* agentkeys: §10.2 method-A pairing — docs + terminology (arch.md, runbooks)

Reconcile every doc + code-comment surface with the agent-initiated
pairing flip (issue #144, method A).

arch.md (single source of truth):
- §10.2 ceremony fully rewritten for method A (agent requests → master
  claims → agent retrieves), incl. the IoT/Sybil-safety rationale + the
  deferred unbind notes (#155/#156).
- §6.2 route list: /v1/agent/pairing/{request,claim,poll} replace
  create + link-code/redeem; pending-bindings ack now by request_id.
- §10.4 re-bootstrap inverted; §5 agent_omni row, §10.6 threat row,
  trust-boundary + actor-role tables, CLI inventory → pairing terms.
  Solidity link_code_redemption calldata param kept (contract unchanged).

operator-runbook-wire.md: Phase P walkthrough (P.0 request → P.1 claim →
P.1b retrieve → P.1c pending → P.2 bind+ack → P.3 grant), 404 trap +
route checks, troubleshooting rows.

v2-stage1-migration-and-demo.md §7: rewritten for method A (also fixes
pre-existing drift — the master, not the broker, submits registerAgentDevice).

issue-144 plan: superseded-front-half banner → method-A doc. issue-74
ephemeral-rebootstrap paragraph corrected. Code doc-comments (mcp-server
config, core actor_omni, cli device_session) → pairing terms.

fmt + clippy --all-targets -D warnings clean on the 3 comment-edited crates.

* agentkeys: method-A plan doc — mark implemented (unbind deferred #155/#156)

* agentkeys: setup-broker-host --ref uses git checkout -f (survive shadowing untracked files)

The --ref deploy path ran a plain `git checkout $PULL_REF`, which ABORTS
when an untracked working-tree file shadows a file the target ref tracks
("untracked working tree files would be overwritten by checkout" — hit on
the broker host when switching to a branch that tracks docs/wiki/*.md the
prior branch did not). The broker host is a deploy target, not a dev
checkout: -f overwrites the colliding files with the tracked version +
discards local edits to TRACKED files, while LEAVING unrelated untracked
files (env, keys, certs — all gitignored) intact. Folds the fix back into
the deploy runbook so the next operator does not hit the same abort.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Full arch.md §10.2 agent-bootstrap ceremony (HDKD omni + broker link-code endpoints)

1 participant