Opt-in test-only agent that drives the operator flows (X App setup / per-psyop OAuth / For-You scrolling) against a mock X

## Summary

Add an **opt-in test-only agent** that drives the operator-facing flows of `psychological-operations` end-to-end against a fully simulated X — the X App setup wizard on a fake `console.x.com`, the per-psyop OAuth handshake on a fake `x.com`, and For-You scrolling on a fake `x.com` timeline. The integration-test harness already mocks the X v2 API at the HTTP layer (`psychological-operations-cli/src/x/mock.rs`, gated by `PSYCHOLOGICAL_OPERATIONS_MOCK_X_API`), but every browser-side flow still requires a human. This agent closes that gap so the run loop can be exercised in CI without an operator and without ever contacting real X.

## TOS boundary (load-bearing)

**This agent must never run against real X.** Automating browser interactions on `x.com` / `console.x.com` violates X's Terms of Service. Using this agent against any real X surface is the operator's problem, not a supported mode.

The implementation enforces this with three independent guards. Any one missing aborts the agent at startup:

1. **Env-var gate.** `PSYCHOLOGICAL_OPERATIONS_TEST_AGENT=1` must be set explicitly. Default-off; never set in release configs. Pairs with `PSYCHOLOGICAL_OPERATIONS_MOCK_X_API=1` (also required — the agent refuses to launch when the X API is hitting the real network).
2. **Cargo feature gate.** Wrap the agent module in `#[cfg(feature = "test-agent")]`. Don't enable in the default feature set; release builds in `psychological-operations-cli/install.sh` and `.github/workflows/release.yml` must omit it. Code paths that route browser navigation are also `cfg`-gated so the agent module simply doesn't exist in shipped binaries.
3. **URL allowlist.** At launch, the agent inspects every URL it's about to drive Chromium toward. If the host resolves to anything outside the local mock origins (loopback, `*.test`, or whatever the mock harness uses), the agent panics with a clear "real X detected, refusing to drive" message. Belt-and-suspenders to catch a misconfiguration where the mock didn't actually start.

The README and the agent's own startup banner say plainly that this is for the mock harness only and that anyone pointing it at real X is on their own.

## What the operator does today

All three flows live in Chromium profiles managed by the CLI:

1. **X App setup** (`psychological-operations-cli/src/x_app/setup.rs:30-66`). CLI spawns Chromium with the auth extension loaded, lands on `https://console.x.com/`. Operator signs in, creates a Project + App, configures user-auth settings (Web App, Read+Write, callback `http://127.0.0.1/callback`), copies Client ID + Client Secret + Bearer Token, pastes into the auth extension's popup form, clicks Save.
2. **Per-psyop OAuth handshake** (`psychological-operations-cli/src/oauth/setup.rs`). CLI launches per-psyop Chromium profile, opens X's authorize URL with the PKCE challenge, binds a localhost callback listener. Operator signs into X with the account this psyop should act as, clicks the Authorize button on the OAuth consent screen, callback resolves, tokens land in `~/.psychological-operations/tokens/<name>.json`.
3. **For-You scrolling**. Operator runs `psyops browse <name>`, the scrape extension's content script (`psychological-operations-chromium-extension-scrape/content_script.js`) walks the For-You DOM as the operator scrolls, sends tweet IDs over native messaging into `for_you_queue`.

## What the test agent does

For each flow, the agent steps in for the operator:

1. **X App setup** — drives the mock `console.x.com` wizard end-to-end: clicks Create Project, fills in the app form, configures user-auth settings (writes the `127.0.0.1/callback` URL, picks Web App + Read+Write), copies the synthetic Client ID / Client Secret / Bearer Token from the mock keys-and-tokens page, pastes them into the auth extension's popup, hits Save. Verifies `~/.psychological-operations/x_app.json` lands on disk with the expected values.
2. **Per-psyop OAuth handshake** — on the mock `x.com`, signs in as the psyop's pre-seeded test account (mock auth — no real password roundtrip), clicks Authorize on the OAuth consent screen, lets the localhost callback resolve. Verifies `tokens/<name>.json` is written with a valid access token + refresh token.
3. **For-You scrolling** — drives the mock For-You feed by scrolling the `<main>` element through N pre-seeded tweet articles. Each article matches the DOM shape the scrape extension's selector expects (`article[data-testid="tweet"]` with a `/<handle>/status/<id>` permalink). Verifies all N IDs land in `for_you_queue`.

The agent uses the **Chrome DevTools Protocol** (the existing chromium binary already speaks CDP — no extra dependency). Driving the page deterministically via CDP keeps tests fast and reproducible. No LLM in the loop for v1; this is plain UI scripting.

## Mock-site requirements (prerequisite)

The agent depends on a mock web frontend that doesn't exist yet. v1 of this issue covers the agent itself plus a minimal mock; the mock can grow over time. Required surfaces:

- A static `console.x.com` simulation served by an in-process HTTP server (started alongside the existing `mock.rs` HTTP layer). Pages: project list, project create form, app dashboard, user-auth settings, keys-and-tokens. Just enough DOM for the agent to click through.
- A static `x.com` simulation with two routes: an OAuth `/i/oauth2/authorize` consent page (renders an Authorize button that POSTs back to the registered callback with a synthetic code), and a `/home` route that renders pre-seeded For-You articles in the same DOM shape the scrape extension expects.
- DNS / hostname plumbing so Chromium resolves these synthetic origins. Easiest path: `--host-resolver-rules=MAP console.x.com 127.0.0.1:NNNN, MAP x.com 127.0.0.1:NNNN` plus `--ignore-certificate-errors` for the test profile only.

## Opt-in surface (CLI-side)

A new subcommand: `psychological-operations test-agent <flow> [args]`. Available flows: `x-app-setup`, `psyop-oauth <name>`, `for-you-scroll <name> --tweets <N>`. Each flow blocks until verification passes, then exits 0/non-0. Invocations are stitched together by the integration-test harness; no special integration with the run loop itself.

The subcommand is registered only when the `test-agent` cargo feature is enabled. Without the feature flag the subcommand simply isn't in `psychological-operations --help`.

## Acceptance criteria

- [ ] All three guards (env var, cargo feature, URL allowlist) are independently sufficient to keep the agent off real X. Drop any one and the agent must refuse to start.
- [ ] `psychological-operations test-agent x-app-setup` writes a valid `x_app.json` to a per-test data dir without operator intervention.
- [ ] `psychological-operations test-agent psyop-oauth <name>` writes a valid `tokens/<name>.json` with both access and refresh tokens.
- [ ] `psychological-operations test-agent for-you-scroll <name> --tweets 50` ends with exactly 50 unique IDs in `for_you_queue` for the given psyop.
- [ ] An integration test stitches all three flows + a `psyops run --name <name>` end-to-end against the mock and asserts the expected delivery_queue rows exist.
- [ ] Release binary built without the `test-agent` feature contains no agent code (verified by `nm | grep test_agent` returning empty, or equivalent on Windows).
- [ ] README has a clear "Testing" section explaining the agent is mock-only and TOS-bounded.

## Files

- `psychological-operations-cli/src/test_agent/` (new module, `#[cfg(feature = "test-agent")]`) — the CDP driver, per-flow scripts, the URL-allowlist check.
- `psychological-operations-cli/src/x/mock.rs` — extend to also serve the mock `console.x.com` and `x.com` static pages alongside the API mock, gated on the same env var.
- `psychological-operations-cli/Cargo.toml` — add `test-agent` feature; pull in a minimal CDP client crate (`chromiumoxide` or similar). Feature is **not** in default.
- `psychological-operations-cli/src/run.rs` — register the `test-agent` subcommand under `#[cfg(feature = "test-agent")]`.
- `psychological-operations-cli/tests/test_agent_e2e.rs` (new) — the stitched end-to-end integration test.
- `psychological-operations-chromium-extension-auth/manifest.json` and `psychological-operations-chromium-extension-scrape/manifest.json` — extend `host_permissions` to cover the mock origins (loopback / `*.test`) only when a test build is loaded; production builds keep the current production-only host list.
- `README.md` — Testing section with the TOS warning and how to invoke the agent.

## Out of scope

- LLM-driven UI adaptation. The agent uses fixed selectors against a mock the test harness controls; there's no need for a model in the loop.
- Driving real X. Don't add it. Anyone who wants automation against real X is welcome to fork; this repo doesn't ship that.
- Replacing the operator-driven flows for production users. The X App setup wizard, per-psyop OAuth, and human-driven For-You scrolling remain the production paths. The agent is purely test infrastructure.
- Mock-site fidelity beyond what the agent's selectors need. The mock pages are a stub, not a faithful reproduction of X's UI. They evolve only as the test agent's selectors evolve.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opt-in test-only agent that drives the operator flows (X App setup / per-psyop OAuth / For-You scrolling) against a mock X #4

Summary

TOS boundary (load-bearing)

What the operator does today

What the test agent does

Mock-site requirements (prerequisite)

Opt-in surface (CLI-side)

Acceptance criteria

Files

Out of scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Opt-in test-only agent that drives the operator flows (X App setup / per-psyop OAuth / For-You scrolling) against a mock X #4

Description

Summary

TOS boundary (load-bearing)

What the operator does today

What the test agent does

Mock-site requirements (prerequisite)

Opt-in surface (CLI-side)

Acceptance criteria

Files

Out of scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions