diff --git a/docs/openclaw-qa-architecture.md b/docs/openclaw-qa-architecture.md new file mode 100644 index 0000000..63bec86 --- /dev/null +++ b/docs/openclaw-qa-architecture.md @@ -0,0 +1,180 @@ +--- +title: OpenClaw QA Harness Architecture +summary: How the four `@paleo/openclaw-*` packages fit together — bus, gateway, runner, channel plugins, mocked CLIs, artifact layout, and the OpenClaw quirks the harness papers over. +read_when: + - onboarding to the QA-runner codebase + - debugging a scenario that misbehaves at the harness layer + - touching the Compose stack, the Dockerfile pair, or the mocked-CLI shim + - extending a channel plugin or adding a new one +--- + +# OpenClaw QA Harness Architecture + +Four packages drive automated regression tests against an OpenClaw workspace. Consumers depend on all four; only `openclaw-qa-runner` is the entry point. + +| Package | Role | +| --- | --- | +| `@paleo/openclaw-qa-runner` | Bus, scenario driver, judge, Compose stack, two-Dockerfile pair, CLI (`init` / `env` / `qa`). | +| `@paleo/openclaw-channel-mock-core` | Shared channel library — bus client, action handlers, plugin/setup factories, account helpers. Not consumed directly. | +| `@paleo/openclaw-discord-mock` | Thin wrapper. Registers as channel `discord-mock`, `surface: "discord"`, `autoThread: false`. | +| `@paleo/openclaw-slack-mock` | Thin wrapper. Registers as channel `slack-mock`, `surface: "slack"`, `autoThread: true`. | + +The two wrappers exist side-by-side in one gateway and share a single bus. The runner picks which channel(s) to drive per scenario; `accountId = channelId` keeps per-channel bus state segregated. + +## Service topology + +Three Compose services. All three are built from the **same image** — only the `command` differs. + +``` + ┌─────────┐ +inbound ──▶ │ bus │ ◀── outbound (every channel plugin) + └────┬────┘ + │ HTTP :43123 + ┌─────────┴─────────┐ + │ │ + ┌───▼───┐ ┌───▼────┐ + │gateway│ │ runner │ + └───┬───┘ └───┬────┘ + │ exec() │ POST :43124 /mock-cli/invoke + ▼ ▲ + /opt/qa-mocks/bin ──────┘ (gateway-side shim) +``` + +- **`bus`** — in-memory state store. Conversations, threads, messages, events, cursors. Exposes a small HTTP API consumed by `bus-client.ts` in `channel-mock-core`. +- **`gateway`** — runs `npx openclaw gateway run`. Loads both channel plugins via `plugins.load.paths`. Talks to the bus through its channel plugins; talks to the runner through the mocked-CLI shim. +- **`runner`** — runs scenarios serially. Mints a fresh `conversationId` per task, pushes inbounds onto the bus, polls outbounds, asserts, runs the judge (Anthropic-direct), writes artifacts. + +Healthchecks gate `gateway` on `bus`, and the one-shot `runner` invocation on `gateway`. `runner` is started with `docker compose run --rm --use-aliases runner`; without `--use-aliases` the one-shot container has no network alias and the gateway-side shim's `POST http://runner:43124` fails with `getaddrinfo EAI_AGAIN runner`. + +## Two-Dockerfile pattern + +`openclaw-qa-runner` ships `Dockerfile.base` (consumer-agnostic): Node 24 Alpine, `claw` user with host-matched UID/GID, mock-CLI shim at `/opt/qa-mocks/`, `/etc/profile` rewritten to keep `/opt/qa-mocks/bin` first in PATH. + +The CLI's `env build` builds the base locally as `paleo/openclaw-qa-runner-base:` and injects the tag into the consumer image via the `QA_RUNNER_BASE_TAG` build arg. + +The consumer-owned `Dockerfile` (dropped by `init`) does: + +1. `FROM paleo/openclaw-qa-runner-base:${QA_RUNNER_BASE_TAG}` +2. `COPY` the consumer's `package.json` + `package-lock.json` and `openclaw.json` into the image. +3. `npm ci --include=dev` — pulls the four `@paleo/openclaw-*` packages from the registry. +4. `npx openclaw plugins registry --refresh` so the gateway sees the loaded channels. +5. Optional consumer customizations (extra system packages, skills install, etc.). + +`bin/qa` does **not** rebuild. Re-run `npm run env:build` after edits to `openclaw.json` or the consumer `Dockerfile`, or after bumping any `@paleo/openclaw-*` dependency. + +`Dockerfile.base` overrides `/etc/profile`. OpenClaw's `exec` tool spawns `/bin/sh -lc `, which sources `/etc/profile`. Alpine's stock profile resets PATH to a "safe" default that drops `/opt/qa-mocks/bin`, silently bypassing the shim — so only commands missing from the default PATH (e.g. `git`, not installed in Alpine) would end up shimmed. Overriding the profile keeps the shim first for every command. + +## Compose include + +The consumer ships a thin overlay that pulls in the package's base stack: + +```yaml +include: + - ./node_modules/@paleo/openclaw-qa-runner/docker-compose.yml +``` + +Compose v2.20+ required. The overlay's job is to add consumer-specific service overrides (e.g. extra env vars on `runner`); the base file owns the build context, volumes, healthchecks, and entrypoints. + +Path-shaped vars from `.env.local` (`OPENCLAW_WORKSPACE_DIR`, `OPENCLAW_CONFIG_PATH`, `QA_PROJECTS_DIR`, `QA_SCENARIOS_DIR`, `QA_ARTIFACTS_DIR`, `QA_GATEWAY_LOGS_DIR`) are resolved by the CLI against the consumer's `cwd` before invoking Compose — otherwise Compose `include:` would resolve them relative to the package's compose file under `node_modules/`, breaking natural relative paths. + +The CLI injects `QA_PROJECT_DIR`, `QA_RUNNER_PACKAGE_DIR`, `CLAW_UID`, `CLAW_GID` automatically. + +## Mocked-CLI shim + +The gateway's PATH is prepended at runtime with `/opt/qa-mocks/bin/`, where symlinks `git`, `npm`, `pnpm`, `yarn`, `claude` all point at one Node shim. The shim POSTs to `http://runner:43124/mock-cli/invoke` with `{ cli, argv, cwd, stdin }` and replays the JSON response (`{ stdout, stderr, exitCode }`). + +The sh wrapper at `/opt/qa-mocks/bin/mock-cli-shim` invokes the shim as `node mock-cli-shim.js "$0" "$@"`. The JS reads the symlink name from `argv[2]` (`/opt/qa-mocks/bin/git` → `git`). Without `"$0"`, the shim would see only the script path and reject every call as `unexpected call to mock-cli-shim.js`. + +PATH prepend happens only at gateway runtime — the image build's own `npm install` still uses real `npm`. + +Scenarios register handlers via `ctx.mockCli(name, handler)`. Return value: number → exit code; `void`/`undefined` → 0; throw → exit 1 with `handlerError` recorded. Re-registering the same name in one scenario throws. Any invocation with no matching handler **fails the scenario** with `failure.source = "cliMock"` and `message = "unexpected call to "`, even if no assertion ever ran after. + +The runner binds a single in-flight `ConversationRegistry` per scenario; scenarios run serially through one gateway. Each invocation emits a `cliMock` `ReportEvent` carrying the full `CliMockCall` (argv, cwd, stdin, stdout, stderr, exitCode, durationMs, optional handlerError). + +## Per-scenario isolation + +The bus accumulates state across runs. The only isolation between tasks is the `conversationId` — minted fresh per task as `${scenarioId}-${channel}-${shortRand}` and exposed as `ctx.conversationId`. Scenarios must use `ctx.conversationId` everywhere they currently hard-code a value; metadata that needs to identify the project (e.g. a workspace playbook keying off project name) belongs in the inbound *text*, not in the conversation id. + +Scenarios run serially — the base stack ships one `gateway` container; the mocked-CLI shim and runner-side registry are single in-flight. + +## Channel plugin internals + +Each wrapper exposes two entries in its `package.json`: + +- `openclaw.extensions` → `dist/index.js` — the runtime channel plugin (`defineBundledChannelEntry`). +- `openclaw.setupEntry` → `dist/setup-entry.js` — a setup-only plugin (`defineBundledChannelSetupEntry`). + +Both are required. Without `openclaw.setupEntry`, the loader registers the plugin but `resolvePluginRegistrationPlan` skips the `setup-runtime` mode and the channel pipeline never calls `gateway.startAccount`. The setup plugin is a subset of the runtime plugin (`id`, `meta`, `capabilities`, `reload`, `configSchema`, `setup`, `config` — no `messaging` / `gateway` / `actions` / `message`); the loader's `mergeSetupRuntimeChannelPlugin` fills the rest at runtime. + +Discovery is wired through `plugins.load.paths` in `openclaw.json`, pointing at the package directory inside the image (`/opt/qa-src/node_modules/@paleo/openclaw-{discord,slack}-mock`). Both plugins must be statically enabled via `plugins.entries[""].enabled = true` — auto-enable for non-bundled (`origin: "config"`) plugins is timing-sensitive against `canStartConfiguredChannelPlugin`: the auto-enable mutation can fire after plan resolution checks `explicitlyEnabled`. Static `enabled: true` makes the check deterministic. + +Both channels register together on every gateway boot. The runner selects which to drive per scenario. + +`createChannelMockPlugin` in `channel-mock-core` takes `{ channelId, label, surface, autoThread, getRuntime }`. The two wrappers are ten-line modules that bind these knobs: + +- `discord-mock` — `surface: "discord"`, `autoThread: false`. Full Discord-shaped surface (`send`, `thread-create`, `thread-reply`, `react`, `read`, `edit`, `delete`, `search`). `thread-create` posts an optional `text`/`message`/`content` atomically with the new thread. Free-form agent text without a tool call lands in the parent channel. +- `slack-mock` — `surface: "slack"`, `autoThread: true`. Restricted surface (`react` / `read` / `edit` / `delete` / `reactions` / `search`). Bare-channel inbounds auto-thread on the triggering message; every subsequent outbound from the same turn lands in that thread. + +Inbound metadata claims `Provider` / `Surface` / `OriginatingChannel` = the registered channel id, so the SDK routes tool-schema discovery back to the right plugin. `chat_id` envelope shape is **not** rewritten — scenarios assert on `conversation.id` / `threadId`, not envelope formatting. + +`openclaw.plugin.json` without `channelConfigs` warns at startup (`channel plugin manifest declares without channelConfigs metadata`); the gateway fills missing `label` / `selectionLabel` / `docsPath` / `blurb` from the runtime plugin. Cosmetic. + +## Target normalizer + plugin-action vs send + +OpenClaw's `normalizeMessageActionInput` runs before any `"to"`-mode plugin handler (`send`, `thread-create`, `thread-reply`, `react`, `read`, `edit`, `delete`). It rewrites `channelId` → `target` → `to` and deletes the original `channelId` key. A handler that reads `channelId` directly is broken-by-construction. `channel-mock-core`'s `resolveDestination` always reads `to` first. + +Canonical destination param is `to`. Accepted shapes: + +- `channel:` or bare `` (channel) +- `dm:` +- `group:` +- `thread:/` + +Resolved in the order `to → target → channelId` to match the normalizer's output. + +Plugin actions and `send` route through different handlers in `message-action-runner.ts`. Only `send` triggers the delivery mirror, which historically tripped a lock-fence race (`EmbeddedAttemptSessionTakeoverError`). Plugin actions don't set `ctx.mirror` and never trip the race. Workspace-driven outbound that needs a thread should use `thread-create` + `thread-reply` rather than `send`. + +`BindingMatchSchema` is strict-equality on `peer.id`. No catch-all binding without multi-account channel config. The judge agent (in OpenClaw config) is left config-only and never instantiated; the actual judge runs out-of-process from the runner against Anthropic directly. + +## Artifacts & cost + +Layout: `artifacts//-[-][-]/`. + +- `` — iteration index, padded to the width of `--iterations`. Omitted when `--iterations 1`. +- `` — `PASS` / `FAIL`. Applied by **renaming the directory** after `report.json` lands. A directory with no verdict suffix means the run is pending or crashed before the rename. + +Two files per task: + +- `events.jsonl` — appended live as the scenario runs, one `ReportEvent` per line. Survives a runner crash. Original write-order `seq`s preserved. +- `report.json` — final `ScenarioReport`, written once at end. Re-merges the live events with `agentToolCall` entries parsed from the gateway's `anthropic-payload.jsonl` (filtered by `conversationId`), re-assigns `seq` by `ts`, and adds per-scenario `cost = { gatewayUsd, judgeUsd, totalUsd, gatewayTurns }`. + +Event kinds: `log` · `inboundSent` · `outboundReceived` · `assertion` · `judge` · `cliMock` · `agentToolCall` · `failure`. `outboundReceived` captures every bus outbound for the conversation, not only the ones the scenario explicitly awaits. `agentToolCall` lives only in `report.json`. + +Authoritative types: `packages/openclaw-qa-runner/src/report.ts`. + +Cost: the runner sums the gateway's `stage:"usage"` entries from `anthropic-payload.jsonl` for any entry with `ts >= runStart`, plus the judge's inline `usage` priced via an in-runner table. A 5s grace wait after the last task lets OpenClaw flush its usage record (it lands ~2s after the outbound hits the bus). Failing runs that time out before the agent completes report `$0.0000` — the gateway never wrote a usage record for the unfinished turn. + +`OPENCLAW_ANTHROPIC_PAYLOAD_LOG=1` is forced on by the Compose stack — QA needs the file. `OPENCLAW_RAW_STREAM=1` is opt-in for `raw-stream.jsonl`. Both land under `.gateway-logs/` (bind-mounted from `~/.openclaw/logs/`). + +## Judge + +`judgeLLM` calls Anthropic directly from the runner — no bus traffic, no gateway involvement. Not an OpenClaw agent. Model defaults to `anthropic/claude-haiku-4-5`; override via `QA_JUDGE_MODEL` on the runner service. LiteLLM-style ref required; only the `anthropic/` provider is wired up today. + +Prefer structural assertions over `judgeLLM`; reserve the judge for free-form content claims. + +## OpenClaw config quirks the harness depends on + +- **`agents.list[*].workspace`, not `workspaceDir`.** The schema accepts both spellings on related surfaces, but the agents list only reads `workspace`. +- **`gateway.mode: "local"` required.** Without it, startup fails with `existing config is missing gateway.mode`. + +## Scenario loading + +Scenarios are `.ts` files under `scenarios/`, default-export `async (ctx: ScenarioContext) => void`. Loaded at runtime by Node 24's built-in TypeScript stripping (the image uses Node 24). Stick to the strip-compatible subset: type annotations, `as`, `satisfies`, generics, interfaces. Avoid `enum`, `namespace`, constructor parameter properties, decorators, `import =`. + +`discoverScenarios()` filters on `.ts` suffix on file entries only — directories under `scenarios/` (e.g. `_lib/`) are ignored, which is the idiomatic place for shared scenario helpers. + +## See also + +- Each package's `README.md` — actionable usage. +- `packages/openclaw-qa-runner/src/context.ts` — `ScenarioContext` definition. +- `packages/openclaw-qa-runner/src/report.ts` — authoritative event/report types. diff --git a/package-lock.json b/package-lock.json index 29514ef..98fcf78 100644 --- a/package-lock.json +++ b/package-lock.json @@ -8480,7 +8480,7 @@ }, "packages/docmap": { "name": "@paleo/docmap", - "version": "0.4.2", + "version": "0.4.3", "license": "CC0-1.0", "bin": { "docmap": "bin/docmap.mjs" @@ -8497,7 +8497,7 @@ }, "packages/openclaw-channel-mock-core": { "name": "@paleo/openclaw-channel-mock-core", - "version": "0.2.2", + "version": "0.2.3", "license": "MIT", "dependencies": { "typebox": "1.1.38" @@ -8518,10 +8518,10 @@ }, "packages/openclaw-discord-mock": { "name": "@paleo/openclaw-discord-mock", - "version": "0.2.2", + "version": "0.2.3", "license": "MIT", "dependencies": { - "@paleo/openclaw-channel-mock-core": "0.2.2" + "@paleo/openclaw-channel-mock-core": "0.2.3" }, "devDependencies": { "@types/node": "~24.12.4", @@ -8539,13 +8539,13 @@ }, "packages/openclaw-qa-runner": { "name": "@paleo/openclaw-qa-runner", - "version": "0.4.0", + "version": "0.4.1", "license": "MIT", "dependencies": { "@anthropic-ai/sdk": "~0.97.1", - "@paleo/openclaw-channel-mock-core": "0.2.2", - "@paleo/openclaw-discord-mock": "0.2.2", - "@paleo/openclaw-slack-mock": "0.2.2" + "@paleo/openclaw-channel-mock-core": "0.2.3", + "@paleo/openclaw-discord-mock": "0.2.3", + "@paleo/openclaw-slack-mock": "0.2.3" }, "bin": { "openclaw-qa-runner": "bin/cli.mjs" @@ -8565,10 +8565,10 @@ }, "packages/openclaw-slack-mock": { "name": "@paleo/openclaw-slack-mock", - "version": "0.2.2", + "version": "0.2.3", "license": "MIT", "dependencies": { - "@paleo/openclaw-channel-mock-core": "0.2.2" + "@paleo/openclaw-channel-mock-core": "0.2.3" }, "devDependencies": { "@types/node": "~24.12.4", diff --git a/packages/docmap/CHANGELOG.md b/packages/docmap/CHANGELOG.md index ac1fa6d..d235917 100644 --- a/packages/docmap/CHANGELOG.md +++ b/packages/docmap/CHANGELOG.md @@ -1,5 +1,11 @@ # @paleo/docmap +## 0.4.3 + +### Patch Changes + +- Improved documentation + ## 0.4.2 ### Patch Changes diff --git a/packages/docmap/README.md b/packages/docmap/README.md index ca30a3f..0256e88 100644 --- a/packages/docmap/README.md +++ b/packages/docmap/README.md @@ -84,4 +84,4 @@ npx @paleo/docmap --root path/to/docs | `--check` | Validate all files and directories. Reports name and frontmatter issues. | | `--root ` | Use a custom directory as the docs root instead of `docs/`. | -For internals, see [docs/docmap-architecture.md](../../docs/docmap-architecture.md). +For internals, see [docs/docmap-architecture.md](https://github.com/paleo/alignfirst/blob/main/docs/docmap-architecture.md). diff --git a/packages/docmap/package.json b/packages/docmap/package.json index ce4fe6a..cd550fc 100644 --- a/packages/docmap/package.json +++ b/packages/docmap/package.json @@ -1,6 +1,6 @@ { "name": "@paleo/docmap", - "version": "0.4.2", + "version": "0.4.3", "license": "CC0-1.0", "description": "A lightweight documentation system for AI agents and humans.", "keywords": [ diff --git a/packages/openclaw-channel-mock-core/CHANGELOG.md b/packages/openclaw-channel-mock-core/CHANGELOG.md index fe97930..596eedb 100644 --- a/packages/openclaw-channel-mock-core/CHANGELOG.md +++ b/packages/openclaw-channel-mock-core/CHANGELOG.md @@ -1,5 +1,11 @@ # @paleo/openclaw-channel-mock-core +## 0.2.3 + +### Patch Changes + +- Improved documentation + ## 0.2.2 ### Patch Changes diff --git a/packages/openclaw-channel-mock-core/README.md b/packages/openclaw-channel-mock-core/README.md index 414c660..bc8a51a 100644 --- a/packages/openclaw-channel-mock-core/README.md +++ b/packages/openclaw-channel-mock-core/README.md @@ -4,10 +4,10 @@ Shared library powering the synthetic OpenClaw channel plugins used in QA harnes Not meant to be consumed directly. Use the surface wrappers: -- [`@paleo/openclaw-discord-mock`](../openclaw-discord-mock/) — `surface: "discord"`, full action surface, `autoThread: false`. -- [`@paleo/openclaw-slack-mock`](../openclaw-slack-mock/) — `surface: "slack"`, restricted action surface, `autoThread: true`. +- [`@paleo/openclaw-discord-mock`](https://www.npmjs.com/package/@paleo/openclaw-discord-mock) — `surface: "discord"`, full action surface, `autoThread: false`. +- [`@paleo/openclaw-slack-mock`](https://www.npmjs.com/package/@paleo/openclaw-slack-mock) — `surface: "slack"`, restricted action surface, `autoThread: true`. -Both wrappers register as OpenClaw channels and talk to a single bus (`http://bus:43123` by default) provisioned by [`@paleo/openclaw-qa-runner`](../openclaw-qa-runner/). +Both wrappers register as OpenClaw channels and talk to a single bus (`http://bus:43123` by default) provisioned by [`@paleo/openclaw-qa-runner`](https://www.npmjs.com/package/@paleo/openclaw-qa-runner). ## Attribution diff --git a/packages/openclaw-channel-mock-core/package.json b/packages/openclaw-channel-mock-core/package.json index 06dfe02..31e2dfe 100644 --- a/packages/openclaw-channel-mock-core/package.json +++ b/packages/openclaw-channel-mock-core/package.json @@ -1,6 +1,6 @@ { "name": "@paleo/openclaw-channel-mock-core", - "version": "0.2.2", + "version": "0.2.3", "description": "Shared library for synthetic OpenClaw channel plugins used in QA harnesses (bus client, action handlers, factories).", "keywords": [ "openclaw", diff --git a/packages/openclaw-discord-mock/CHANGELOG.md b/packages/openclaw-discord-mock/CHANGELOG.md index b4253f4..0828970 100644 --- a/packages/openclaw-discord-mock/CHANGELOG.md +++ b/packages/openclaw-discord-mock/CHANGELOG.md @@ -1,5 +1,13 @@ # @paleo/openclaw-discord-mock +## 0.2.3 + +### Patch Changes + +- Improved documentation +- Updated dependencies + - @paleo/openclaw-channel-mock-core@0.2.3 + ## 0.2.2 ### Patch Changes diff --git a/packages/openclaw-discord-mock/README.md b/packages/openclaw-discord-mock/README.md index ff1dec2..aa7f017 100644 --- a/packages/openclaw-discord-mock/README.md +++ b/packages/openclaw-discord-mock/README.md @@ -1,10 +1,45 @@ # @paleo/openclaw-discord-mock -Synthetic Discord-shaped OpenClaw channel plugin. Registers as channel `discord-mock` with the full Discord action surface (`send`, `thread-create`, `thread-reply`, `react`, `read`, `edit`, `delete`, `search`). `thread-create` posts an optional body atomically with the new thread; free-form agent text without a tool call lands in the parent channel. +Synthetic Discord-shaped OpenClaw channel plugin. Registers as channel `discord-mock`. Full Discord-shaped action surface: `send`, `thread-create`, `thread-reply`, `react`, `read`, `edit`, `delete`, `search`. `thread-create` posts an optional `text` / `message` / `content` atomically with the new thread; free-form agent text without a tool call lands in the parent channel. -Backed by [`@paleo/openclaw-channel-mock-core`](../openclaw-channel-mock-core/) with `surface: "discord"` and `autoThread: false`. Pair with [`@paleo/openclaw-qa-runner`](../openclaw-qa-runner/) for the QA harness. +Backed by [`@paleo/openclaw-channel-mock-core`](https://www.npmjs.com/package/@paleo/openclaw-channel-mock-core) (`surface: "discord"`, `autoThread: false`). Pair with [`@paleo/openclaw-qa-runner`](https://www.npmjs.com/package/@paleo/openclaw-qa-runner) for the QA harness. -`Provider` / `Surface` / `OriginatingChannel` on inbound metadata are claimed as `discord-mock` so the SDK routes tool-schema discovery to this plugin. +## Install + +```sh +npm i -D @paleo/openclaw-discord-mock +``` + +The runner depends on this package transitively — installing `@paleo/openclaw-qa-runner` already pulls it in. + +## Enable + +In your `openclaw.json`: + +```json +{ + "plugins": { + "load": { "paths": ["/opt/qa-src/node_modules/@paleo/openclaw-discord-mock"] }, + "entries": { "discord-mock": { "enabled": true } } + }, + "channels": { + "discord-mock": { + "baseUrl": "http://bus:43123", + "botUserId": "openclaw", + "botDisplayName": "OpenClaw QA", + "allowFrom": ["*"] + } + } +} +``` + +`enabled: true` must be **static**. Auto-enable for `origin: "config"` plugins is timing-sensitive against the plan-resolution `explicitlyEnabled` check. + +## Target format + +Canonical destination is the `to` param. Accepts `channel:` / bare `` / `dm:` / `group:` / `thread:/`. + +`Provider` / `Surface` / `OriginatingChannel` on inbound metadata are claimed as `discord-mock` so the SDK routes tool-schema discovery to this plugin. The `chat_id` envelope shape is **not** rewritten — assert on `conversation.id` / `threadId`. ## Attribution diff --git a/packages/openclaw-discord-mock/package.json b/packages/openclaw-discord-mock/package.json index 701304d..7291bd7 100644 --- a/packages/openclaw-discord-mock/package.json +++ b/packages/openclaw-discord-mock/package.json @@ -1,6 +1,6 @@ { "name": "@paleo/openclaw-discord-mock", - "version": "0.2.2", + "version": "0.2.3", "description": "Synthetic Discord-shaped OpenClaw channel plugin for automated QA scenarios.", "keywords": [ "openclaw", @@ -63,7 +63,7 @@ "openclaw": "*" }, "dependencies": { - "@paleo/openclaw-channel-mock-core": "0.2.2" + "@paleo/openclaw-channel-mock-core": "0.2.3" }, "devDependencies": { "@types/node": "~24.12.4", diff --git a/packages/openclaw-qa-runner/CHANGELOG.md b/packages/openclaw-qa-runner/CHANGELOG.md index bcfb678..e9e4d54 100644 --- a/packages/openclaw-qa-runner/CHANGELOG.md +++ b/packages/openclaw-qa-runner/CHANGELOG.md @@ -1,5 +1,15 @@ # @paleo/openclaw-qa-runner +## 0.4.1 + +### Patch Changes + +- Improved documentation +- Updated dependencies + - @paleo/openclaw-channel-mock-core@0.2.3 + - @paleo/openclaw-discord-mock@0.2.3 + - @paleo/openclaw-slack-mock@0.2.3 + ## 0.4.0 ### Minor Changes diff --git a/packages/openclaw-qa-runner/README.md b/packages/openclaw-qa-runner/README.md index 8a5302b..11d4e5d 100644 --- a/packages/openclaw-qa-runner/README.md +++ b/packages/openclaw-qa-runner/README.md @@ -1,8 +1,10 @@ # @paleo/openclaw-qa-runner -Dockerised regression-test harness for OpenClaw workspaces. Drives the agent through two synthetic channels (`discord-mock`, `slack-mock`) and asserts the results. One gateway, one bus, parallel scenarios. +Dockerised regression-test harness for OpenClaw workspaces. Drives the agent through two synthetic channels (`discord-mock`, `slack-mock`) and asserts the results. -Toolkit entry point. Pair with [`@paleo/openclaw-channel-mock-core`](../openclaw-channel-mock-core/), [`@paleo/openclaw-discord-mock`](../openclaw-discord-mock/), [`@paleo/openclaw-slack-mock`](../openclaw-slack-mock/). +Pair with [`@paleo/openclaw-channel-mock-core`](https://www.npmjs.com/package/@paleo/openclaw-channel-mock-core), [`@paleo/openclaw-discord-mock`](https://www.npmjs.com/package/@paleo/openclaw-discord-mock), [`@paleo/openclaw-slack-mock`](https://www.npmjs.com/package/@paleo/openclaw-slack-mock). + +For internals (topology, Dockerfile pair, mocked-CLI shim, channel plugin mechanics, OpenClaw quirks), see [openclaw-qa-architecture.md](https://github.com/paleo/alignfirst/blob/main/docs/openclaw-qa-architecture.md). ## Install @@ -10,22 +12,9 @@ Toolkit entry point. Pair with [`@paleo/openclaw-channel-mock-core`](../openclaw npm i -D @paleo/openclaw-qa-runner @paleo/openclaw-channel-mock-core @paleo/openclaw-discord-mock @paleo/openclaw-slack-mock openclaw ``` -Requires Docker Compose v2.20+ (consumer overlay uses Compose `include:`). - -## Init - -```sh -npx @paleo/openclaw-qa-runner init . -``` - -Drops four files into the target directory: - -- `openclaw.json` — gateway config (mode `local`, both channel plugins enabled, main agent). -- `.env.local.example` — copy to `.env.local`, set `ANTHROPIC_API_KEY` and `OPENCLAW_WORKSPACE_DIR` (host path to your OpenClaw workspace). -- `docker-compose.yml` — thin overlay that `include:`s this package's base stack from `node_modules/`. -- `Dockerfile` — consumer-owned image. Inherits the package's base via `FROM paleo/openclaw-qa-runner-base:${QA_RUNNER_BASE_TAG}` (the tag is injected by the CLI from the installed package version). Add `RUN`/`COPY`/`ENV` directives for any consumer-specific setup (skill installs, extra system packages, etc.). +Requires Docker Compose v2.20+ (overlay uses Compose `include:`). -Then wire `package.json` scripts: +Wire `package.json` scripts: ```json "scripts": { @@ -36,85 +25,116 @@ Then wire `package.json` scripts: } ``` -Each command derives `QA_PROJECT_DIR` from `cwd`, `QA_RUNNER_PACKAGE_DIR` from its own install location, and `CLAW_UID`/`CLAW_GID` from the host user — no boilerplate in `package.json`. +## Init + +```sh +npx @paleo/openclaw-qa-runner init +``` + +Drops four files: + +- `openclaw.json` — gateway config (mode `local`, both channel plugins enabled, main agent placeholder). +- `.env.local.example` — copy to `.env.local`, fill `ANTHROPIC_API_KEY` + `OPENCLAW_WORKSPACE_DIR`. +- `docker-compose.yml` — thin overlay that `include:`s the base from `node_modules/`. +- `Dockerfile` — consumer-owned. Inherits the base via `FROM paleo/openclaw-qa-runner-base:${QA_RUNNER_BASE_TAG}`. Add `RUN`/`COPY`/`ENV` for consumer-specific setup (extra system packages, skills install, etc.). ## Configure Edit `openclaw.json`: -- `agents.list[id=main].model` — LiteLLM-style `provider/model` ref (e.g. `anthropic/claude-sonnet-4-6`). The template ships a placeholder; OpenClaw will fail loudly until you pick one. -- `agents.list[id=main].workspace` — host path to your OpenClaw workspace (bind-mounted into the gateway). +- `agents.list[id=main].model` — LiteLLM-style `provider/model` ref. The template ships a placeholder; OpenClaw fails loudly until you pick one. +- `agents.list[id=main].workspace` — host path to your OpenClaw workspace, bind-mounted into the gateway. Field name is **`workspace`**, not `workspaceDir`. - `channels.*` — both `discord-mock` and `slack-mock` blocks point at the same bus. -The LLM judge runs out-of-process — Anthropic-direct from the runner, never through the gateway — so it is **not** an OpenClaw agent and is not configured via `openclaw.json`. Defaults to `anthropic/claude-haiku-4-5`; override via the `QA_JUDGE_MODEL` env var on the runner. The ref must be LiteLLM-style; only the `anthropic/` provider is wired up today. +Drop scenarios under `scenarios/.ts`. Project fixtures under `projects-fixture/` (bind-mounted to `~/projects/` in the gateway). + +Scenarios are loaded by Node 24's built-in TypeScript stripping. Stick to the strip-compatible subset (no `enum`, `namespace`, decorators, ctor parameter properties, `import =`). Shared helpers go under `scenarios/_lib/` — `discoverScenarios()` skips directories. + +## Env vars (`.env.local`) -Drop scenarios under `scenarios/.ts`, default-export `async (ctx: ScenarioContext) => void`. Project fixtures live under `projects-fixture/` (bind-mounted to `~/projects/`). +Required: -Scenarios are loaded at runtime by Node's built-in TypeScript stripping (Node 24, which the image uses). Stick to the strip-compatible subset: type annotations, `as`, `satisfies`, generics, interfaces. Avoid `enum`, `namespace`, constructor parameter properties, decorators, and `import =`. +- `ANTHROPIC_API_KEY` +- `OPENCLAW_WORKSPACE_DIR` — host path mounted at `/home/claw/.openclaw/workspace`. -## Build / up / run +Optional (defaults relative to the consumer's `qa/` dir): -`env:build` first builds the consumer-agnostic base image (`paleo/openclaw-qa-runner-base:`, locally tagged) from this package's `Dockerfile.base`, then runs `docker compose build` against the consumer's `Dockerfile`. Docker layer cache makes repeat base builds near-free; `env:up` / `qa` skip the base build when the tag already exists. +- `OPENCLAW_CONFIG_PATH` → `./openclaw.json` +- `QA_PROJECTS_DIR` → `./projects-fixture` +- `QA_SCENARIOS_DIR` → `./scenarios` +- `QA_ARTIFACTS_DIR` → `./artifacts` +- `QA_GATEWAY_LOGS_DIR` → `./.gateway-logs` +- `OPENCLAW_RAW_STREAM=1` — also write `raw-stream.jsonl` alongside the always-on `anthropic-payload.jsonl`. + +`QA_PROJECT_DIR`, `QA_RUNNER_PACKAGE_DIR`, `CLAW_UID`, `CLAW_GID` are injected by the CLI. + +## Run ```sh -npm run env:build # build the gateway / bus / runner image +npm run env:build # build base + consumer image npm run env:up # bring up bus + gateway (both channels register) npm run qa -- --channel all # one scenario, both channels npm run qa -- --channel all --all # every scenario, both channels npm run qa -- --channel discord-mock # restrict to one channel -npm run qa -- --channel all --iterations 5 # repeat each (scenario, channel) pair 5 times +npm run qa -- --channel all --iterations 5 # repeat each (scenario, channel) pair 5× npm run qa -- --channel all --iterations 5 --max-failures 1 # abort a pair after >1 failure npm run env:down ``` -Artifacts land under `artifacts//-[-][-]/`: `` is the iteration index (omitted when `--iterations 1`), `` is `PASS` / `FAIL`, applied by renaming the dir after `report.json` is written — its absence means the run is still pending or crashed before rename. Exit 0 iff every pair passes. +`env:build` first builds the base image (`paleo/openclaw-qa-runner-base:`) from this package's `Dockerfile.base`, then builds the consumer image. Layer cache makes repeat base builds near-free; `env:up` / `qa` skip the base build when the tag already exists. + +Rebuild required after: bumping any `@paleo/openclaw-*` dependency, edits to `openclaw.json`, or any change to the consumer `Dockerfile`. + +Scenarios run **serially** through one gateway. Exit 0 iff every pair passes. ## Scenario primitives From `@paleo/openclaw-qa-runner` (`src/context.ts`): - `channel`, `conversationId`, `accountId` — per-task isolation. Use `ctx.conversationId` everywhere; never hard-code a value. -- `sendInbound(text, opts?)` — push an inbound on the bus. -- `poll(opts?)`, `waitForOutbound(opts?)`, `expectNoOutbound(opts?)` — bus consumers. +- `sendInbound(input)` — push an inbound on the bus. +- `poll`, `waitForOutbound`, `expectNoOutbound` — bus consumers. - `assertRegex`, `assertEqual`, `assertLength` — structural assertions. -- `judgeLLM(prompt)` — Anthropic-direct judgement (no bus traffic). +- `judgeLLM({ message, rubric, label })` — Anthropic-direct judgement (no bus traffic, no gateway). +- `mockCli(name, handler)` — intercepts the gateway's calls to `git` / `npm` / `pnpm` / `yarn` / `claude`. Unregistered calls fail the scenario with `failure.source = "cliMock"`. - `log`, `getCursor`. Prefer structural assertions over `judgeLLM`; reserve the judge for free-form content claims. -## Channels +## Judge model -Both plugins register together. Pick which to drive per scenario via `--channel discord-mock|slack-mock|all`. +Defaults to `anthropic/claude-haiku-4-5`. Override via `QA_JUDGE_MODEL` on the `runner` service (set in your consumer overlay). The ref must be LiteLLM-style; only the `anthropic/` provider is wired up today. The judge is **not** an OpenClaw agent — don't configure it in `openclaw.json`. -- `discord-mock` — full Discord-shaped surface; `thread-create` posts an optional body atomically. -- `slack-mock` — restricted Slack-shaped surface (`react` / `read` / `edit` / `delete` / `reactions` / `search`). Bare-channel inbounds auto-thread on the triggering message. +## Artifacts -## Compose stack +`artifacts//-[-][-]/`: -The CLI sets `QA_PROJECT_DIR`, `QA_RUNNER_PACKAGE_DIR`, `CLAW_UID`, `CLAW_GID` automatically. Everything else comes from `.env.local`: +- `events.jsonl` — appended live, survives a runner crash. +- `report.json` — final `ScenarioReport`. Merges `events.jsonl` with `agentToolCall` entries from the gateway payload log; adds per-scenario `cost`. -- `ANTHROPIC_API_KEY` — required. -- `OPENCLAW_WORKSPACE_DIR` — required (host path mounted at `/home/claw/.openclaw/workspace`). -- `OPENCLAW_CONFIG_PATH` — default `/openclaw.json` → `/home/claw/.openclaw/openclaw.json`. -- `QA_PROJECTS_DIR` — default `/projects-fixture` → `/home/claw/projects/`. -- `QA_SCENARIOS_DIR` — default `/scenarios` → `/opt/qa-src/scenarios`. -- `QA_ARTIFACTS_DIR` — default `/artifacts` → `/opt/qa-artifacts`. -- `QA_GATEWAY_LOGS_DIR` — default `/.gateway-logs` → `/home/claw/.openclaw/logs`. +`` is the iteration index (omitted when `--iterations 1`). `` is `PASS` / `FAIL`, applied by **renaming the directory** after `report.json` is written. A directory with no verdict suffix means the run is pending or crashed before rename. -`` is the consumer's qa dir (the wrapper's `cwd`). Rebuild this package's `dist/` with `npm run build` to refresh the mount. +Authoritative types: `src/report.ts`. -Healthchecks: `gateway` waits on `bus`, `runner` waits on `gateway`. +## Channels -## Gateway logs (opt-in) +Both `discord-mock` and `slack-mock` register on every boot. Pick which to drive per scenario via `--channel discord-mock|slack-mock|all`. -Set in `.env.local`: +- `discord-mock` — full Discord-shaped surface; `thread-create` posts an optional body atomically. +- `slack-mock` — restricted Slack-shaped surface (`react` / `read` / `edit` / `delete` / `reactions` / `search`). Bare-channel inbounds auto-thread on the triggering message. -```sh -OPENCLAW_ANTHROPIC_PAYLOAD_LOG=1 -OPENCLAW_RAW_STREAM=1 -``` +Inbound metadata claims `Provider` / `Surface` / `OriginatingChannel` = the registered channel id so the SDK routes tool-schema discovery to the right plugin. Assert on `conversation.id` / `threadId`, not envelope formatting. + +## Target format + +Canonical destination param is `to`. Accepted shapes: + +- `channel:` or bare `` (channel) +- `dm:` +- `group:` +- `thread:/` -Writes `anthropic-payload.jsonl` and `raw-stream.jsonl` under `.gateway-logs/`. The runner's cost reporting reads `anthropic-payload.jsonl`. +Actions resolve `to → target → channelId`. ## Attribution diff --git a/packages/openclaw-qa-runner/package.json b/packages/openclaw-qa-runner/package.json index 5b152a4..1f4744d 100644 --- a/packages/openclaw-qa-runner/package.json +++ b/packages/openclaw-qa-runner/package.json @@ -1,6 +1,6 @@ { "name": "@paleo/openclaw-qa-runner", - "version": "0.4.0", + "version": "0.4.1", "description": "Dockerised regression-test harness for OpenClaw workspaces: bus, scenario driver, judge, Compose stack.", "keywords": [ "openclaw", @@ -47,9 +47,9 @@ }, "dependencies": { "@anthropic-ai/sdk": "~0.97.1", - "@paleo/openclaw-channel-mock-core": "0.2.2", - "@paleo/openclaw-discord-mock": "0.2.2", - "@paleo/openclaw-slack-mock": "0.2.2" + "@paleo/openclaw-channel-mock-core": "0.2.3", + "@paleo/openclaw-discord-mock": "0.2.3", + "@paleo/openclaw-slack-mock": "0.2.3" }, "devDependencies": { "@types/node": "~24.12.4", diff --git a/packages/openclaw-qa-runner/templates/Dockerfile b/packages/openclaw-qa-runner/templates/Dockerfile index 90fabb8..c6a26c5 100644 --- a/packages/openclaw-qa-runner/templates/Dockerfile +++ b/packages/openclaw-qa-runner/templates/Dockerfile @@ -7,22 +7,10 @@ ARG QA_RUNNER_BASE_TAG FROM paleo/openclaw-qa-runner-base:${QA_RUNNER_BASE_TAG} -# Stage the host-resolved @paleo/openclaw-* packages into an in-image staging dir, -# then rewrite the consumer `package.json`'s absolute host `file:` URIs to point -# at the staged copies before `npm install`. Host install must use `--install-links` -# so `node_modules/@paleo/*` are materialized copies (not symlinks). -COPY --chown=claw:claw package.json package-lock.json* /opt/qa-src/ -COPY --chown=claw:claw node_modules/@paleo/openclaw-channel-mock-core/ /opt/qa-staging/openclaw-channel-mock-core/ -COPY --chown=claw:claw node_modules/@paleo/openclaw-discord-mock/ /opt/qa-staging/openclaw-discord-mock/ -COPY --chown=claw:claw node_modules/@paleo/openclaw-slack-mock/ /opt/qa-staging/openclaw-slack-mock/ -COPY --chown=claw:claw node_modules/@paleo/openclaw-qa-runner/ /opt/qa-staging/openclaw-qa-runner/ +COPY --chown=claw:claw package.json package-lock.json /opt/qa-src/ COPY --chown=claw:claw openclaw.json /home/claw/.openclaw/openclaw.json -# The host lockfile pins integrity against the original host paths; it cannot -# survive the URI rewrite, so drop it and let npm regenerate. -RUN node -e "const fs=require('fs');const p='./package.json';const j=JSON.parse(fs.readFileSync(p,'utf8'));for(const k of Object.keys(j.dependencies||{})){if(k.startsWith('@paleo/openclaw-')){const n=k.slice('@paleo/'.length);j.dependencies[k]='file:/opt/qa-staging/'+n;}}fs.writeFileSync(p,JSON.stringify(j,null,2));" && \ - rm -f package-lock.json && \ - npm install --include=dev --install-links && \ +RUN npm ci --include=dev && \ OPENCLAW_CONFIG_PATH=/home/claw/.openclaw/openclaw.json npx openclaw plugins registry --refresh # Consumer customizations below. Add RUN/COPY/ENV as needed. diff --git a/packages/openclaw-slack-mock/CHANGELOG.md b/packages/openclaw-slack-mock/CHANGELOG.md index 949cb98..00b5344 100644 --- a/packages/openclaw-slack-mock/CHANGELOG.md +++ b/packages/openclaw-slack-mock/CHANGELOG.md @@ -1,5 +1,13 @@ # @paleo/openclaw-slack-mock +## 0.2.3 + +### Patch Changes + +- Improved documentation +- Updated dependencies + - @paleo/openclaw-channel-mock-core@0.2.3 + ## 0.2.2 ### Patch Changes diff --git a/packages/openclaw-slack-mock/README.md b/packages/openclaw-slack-mock/README.md index b962804..17fde18 100644 --- a/packages/openclaw-slack-mock/README.md +++ b/packages/openclaw-slack-mock/README.md @@ -1,10 +1,45 @@ # @paleo/openclaw-slack-mock -Synthetic Slack-shaped OpenClaw channel plugin. Registers as channel `slack-mock` with a restricted action surface (`read`, `edit`, `delete`, `react`, `reactions`, `search`). No `send` / `thread-create` / `thread-reply`. Bare-channel inbounds auto-thread: the first agent outbound creates a thread anchored on the inbound message id; every subsequent outbound from that turn lands in the same thread. +Synthetic Slack-shaped OpenClaw channel plugin. Registers as channel `slack-mock`. Restricted Slack-shaped action surface: `read`, `edit`, `delete`, `react`, `reactions`, `search`. No `send` / `thread-create` / `thread-reply`. Bare-channel inbounds auto-thread: the first agent outbound creates a thread anchored on the inbound message id; every subsequent outbound from the same turn lands in that thread. -Backed by [`@paleo/openclaw-channel-mock-core`](../openclaw-channel-mock-core/) with `surface: "slack"` and `autoThread: true`. Pair with [`@paleo/openclaw-qa-runner`](../openclaw-qa-runner/) for the QA harness. +Backed by [`@paleo/openclaw-channel-mock-core`](https://www.npmjs.com/package/@paleo/openclaw-channel-mock-core) (`surface: "slack"`, `autoThread: true`). Pair with [`@paleo/openclaw-qa-runner`](https://www.npmjs.com/package/@paleo/openclaw-qa-runner) for the QA harness. -`Provider` / `Surface` / `OriginatingChannel` on inbound metadata are claimed as `slack-mock` so the SDK routes tool-schema discovery to this plugin. +## Install + +```sh +npm i -D @paleo/openclaw-slack-mock +``` + +The runner depends on this package transitively — installing `@paleo/openclaw-qa-runner` already pulls it in. + +## Enable + +In your `openclaw.json`: + +```json +{ + "plugins": { + "load": { "paths": ["/opt/qa-src/node_modules/@paleo/openclaw-slack-mock"] }, + "entries": { "slack-mock": { "enabled": true } } + }, + "channels": { + "slack-mock": { + "baseUrl": "http://bus:43123", + "botUserId": "openclaw", + "botDisplayName": "OpenClaw QA", + "allowFrom": ["*"] + } + } +} +``` + +`enabled: true` must be **static**. Auto-enable for `origin: "config"` plugins is timing-sensitive against the plan-resolution `explicitlyEnabled` check. + +## Target format + +Canonical destination is the `to` param. Accepts `channel:` / bare `` / `dm:` / `group:` / `thread:/`. + +`Provider` / `Surface` / `OriginatingChannel` on inbound metadata are claimed as `slack-mock` so the SDK routes tool-schema discovery to this plugin. The `chat_id` envelope shape is **not** rewritten — assert on `conversation.id` / `threadId`. ## Attribution diff --git a/packages/openclaw-slack-mock/package.json b/packages/openclaw-slack-mock/package.json index f932a38..af46bd1 100644 --- a/packages/openclaw-slack-mock/package.json +++ b/packages/openclaw-slack-mock/package.json @@ -1,6 +1,6 @@ { "name": "@paleo/openclaw-slack-mock", - "version": "0.2.2", + "version": "0.2.3", "description": "Synthetic Slack-shaped OpenClaw channel plugin for automated QA scenarios.", "keywords": [ "openclaw", @@ -63,7 +63,7 @@ "openclaw": "*" }, "dependencies": { - "@paleo/openclaw-channel-mock-core": "0.2.2" + "@paleo/openclaw-channel-mock-core": "0.2.3" }, "devDependencies": { "@types/node": "~24.12.4",