diff --git a/.claude/skills/e2e-pr/SKILL.md b/.claude/skills/e2e-pr/SKILL.md new file mode 100644 index 00000000..d0941475 --- /dev/null +++ b/.claude/skills/e2e-pr/SKILL.md @@ -0,0 +1,65 @@ +--- +name: e2e-pr +description: Run the Playwright end-to-end suite against the routes a PR changes and report results. Use when verifying UI behavior for a branch or PR, or checking that a change didn't break key pages. +--- + +# e2e-pr + +Run the Playwright end-to-end suite against the routes a PR touches, and report results. + +## Prerequisites + +- `@playwright/test` is installed and `playwright.config.ts` exists at the repo root. +- The config auto-boots `pnpm dev` (its `webServer` block) and waits for `/about`, + so you do NOT need to start a server yourself unless one is already running. +- In CI the browser comes from `playwright install`; in the hosted sandbox it's + the pre-installed Chromium at `/opt/pw-browsers`. The config handles both. + +## Steps + +1. **Find changed files** vs the base branch (default `origin/main`): + ``` + git diff --name-only origin/main...HEAD + ``` + +2. **Map changed files to affected routes** (best-effort): + - `src/app/(group)/foo/page.tsx` → `/foo` (drop `src/app`, drop `(route-group)` + segments, drop the trailing `/page.tsx`) + - `src/app/(standard)/page.tsx` → `/` + - `layout.tsx` / `template.tsx` changes affect every route beneath them. + - Dynamic segments (`[id]`) can't be visited without a concrete value — note + them but don't try to test them blindly. + - Changes under `src/components/**` are shared and can't be mapped to a single + route → treat as "broad" (run the whole suite). + +3. **Choose scope:** + - If specific route files changed and there are specs covering them, run those: + ``` + pnpm test:e2e -g "" + ``` + or pass specific spec files: `pnpm test:e2e tests/e2e/.spec.ts` + - If changes are broad (shared components, layout, config) OR the suite is + small, just run everything: `pnpm test:e2e` + +4. **Run** the chosen command. The first run cold-compiles routes in `next dev`, + so allow time — the config already uses generous timeouts. + +5. **Report:** + - Pass/fail, number of tests run, and any failures with their error message. + - On failure, point to the HTML report (`pnpm test:e2e:report`) and the trace + (saved under `test-results/` on first retry). + - Call out any changed routes that have NO e2e coverage as gaps (not failures), + so the author can decide whether to add a spec. + +## Notes + +- Data-dependent routes (homepage, `/analysis/*`) need env/secrets or network + mocking — see `tests/e2e/README.md`. Don't add flaky assertions on them. +- Do NOT auto-write new specs here; that's authoring. This skill runs existing + tests. (Use `pr-describe` or a dedicated authoring step to add coverage.) + +## Args + +Optional base branch to diff against (default `origin/main`). + +Example: `/e2e-pr` or `/e2e-pr staging` diff --git a/.claude/skills/pr-check/SKILL.md b/.claude/skills/pr-check/SKILL.md new file mode 100644 index 00000000..a3b56ace --- /dev/null +++ b/.claude/skills/pr-check/SKILL.md @@ -0,0 +1,57 @@ +--- +name: pr-check +description: Run the full quality gate on the current branch — TypeScript typecheck, ESLint, web/CLI Jest tests, and blueprint validation — then summarize pass/fail. Use before opening a PR, or to check whether a branch is CI-ready. +--- + +# pr-check + +Run a full quality gate on the current branch before or after a PR is created. Orchestrates type checking, linting, and tests, then produces a structured summary. + +## Steps + +1. **Identify scope** — get the diff summary: + ``` + git diff --stat origin/main...HEAD + ``` + Note whether changes touch: source code, blueprint files, tests, docs, config. + +2. **Run checks in parallel where possible** (report results as each finishes): + + | Check | Command | When to run | + |-------|---------|-------------| + | TypeScript | `pnpm typecheck` | Always | + | Lint | `pnpm lint` | Always | + | Web tests | `pnpm test:web` | If `src/app/`, `src/components/`, `src/hooks/`, or `src/point-functions/` changed | + | CLI tests | `pnpm test:cli` | If `src/cli/` or `src/lib/` changed | + | Blueprint validate | see `blueprint-validate` skill | If any `.yaml`/`.yml`/`.json` blueprint files changed | + +3. **Collect results** — for each check record: + - Status: ✅ pass / ❌ fail / ⚠️ warnings / ⏭️ skipped (not applicable) + - Error/warning count + - Key details (first 5 errors max per check to keep output readable) + +4. **Produce a summary table:** + ``` + ## PR Check Results — + + | Check | Status | Details | + |---------------|--------|--------------------------| + | TypeScript | ✅ | 0 errors | + | Lint | ⚠️ | 2 warnings, 0 errors | + | Web tests | ✅ | 47 passed | + | CLI tests | ⏭️ | No CLI files changed | + | Blueprints | ✅ | 1 file validated | + + **Overall: READY TO MERGE** / **NEEDS FIXES** + ``` + +5. If any check fails, list the specific errors that need to be addressed. + +6. If `--comment` is passed as an arg, post this summary as a GitHub PR comment using the available GitHub MCP tools. + +## Args + +- `--comment`: Post the results as a GitHub PR comment (requires PR number to be detectable from the branch) +- Base branch (e.g. `main`, `staging`): defaults to `origin/main` + +Example: `/pr-check` or `/pr-check --comment` or `/pr-check staging` diff --git a/.claude/skills/pr-describe/SKILL.md b/.claude/skills/pr-describe/SKILL.md new file mode 100644 index 00000000..eafafa41 --- /dev/null +++ b/.claude/skills/pr-describe/SKILL.md @@ -0,0 +1,266 @@ +--- +name: pr-describe +description: Generate a structured PR description from the branch diff (Summary, Changes, Test plan, Risks, Related Issues), with optional static-gated before/after screenshots for visual changes. Use when writing or updating a pull request description. +--- + +# pr-describe + +Generate a high-quality PR description from the branch diff. The description +itself is the core deliverable and always runs. Before/after screenshots are a +**bonus that only activates on static, visual PRs** — they're strictly +time-boxed, **fail soft** (any problem → clean text-only description), and are +**off by default for data-driven routes** (see "Default screenshot scope"). + +## Output contract + +- If a PR already exists for the current branch → update its **title and body** + (GitHub MCP `update_pull_request`). Only change the title if it doesn't already + follow the convention below. +- If no PR exists → print the finished **title + markdown body** for the user. + Do **NOT** open a PR unless the user explicitly asked for one. + +## PR title convention + +Produce a properly tagged title in **Conventional Commits** form: + +``` +type(scope): imperative summary +``` + +- **type** — infer from the dominant change: `feat` (new capability), `fix` + (bug fix), `docs`, `test` (tests/test infra), `ci` (CI/workflows), `refactor`, + `perf`, `chore`, `build`, `style`. +- **scope** — optional, a short area derived from the diff (e.g. `e2e`, `auth`, + `header`, `cli`, `pr-eval`). Omit if it spans many areas. +- **summary** — imperative mood, lower-case start, **no trailing period**, aim + for ≤ ~70 chars total. +- If the change mixes types, pick the one that best describes the user-facing + intent (a feature with its tests is still `feat`). + +Examples: `feat(header): collapse nav into a menu under 380px` · +`test(e2e): add Playwright smoke suite` · `fix(cli): handle empty blueprint id`. + +## Security guardrails (NON-NEGOTIABLE — this repo is public) + +Screenshots publish whatever they render. On a **public** repo, anything you +commit is world-readable via raw URLs **permanently** (git history, forks, CDN +cache) — deleting the branch does NOT undo it. So: + +1. **Route denylist — never screenshot these, no exceptions:** + `/admin*`, `/api*`, and any authenticated/session/account route. If a changed + route matches, skip it and note "omitted for safety", do not capture it. +2. **Capture only against a secret-free environment.** Use the local `pnpm dev` + with NO real storage/API secrets wired up, so data-driven pages render + empty/mock and there is nothing sensitive in frame. Do **not** screenshot a + preview deploy that is backed by real data/secrets and then commit it here. +3. **Treat committed images as permanent and public.** Never tell the user they + can "delete before merge" to undo exposure — that is false. +4. **If a capture could contain anything sensitive, do NOT commit it to the + branch.** Prefer CI-artifact hosting (collaborator-only, auto-expiring) or a + PR comment. Committing to the public branch is only for plainly non-sensitive, + static UI. +5. **Surface images for human review before pushing.** Show the user what was + captured and let them confirm; never push blind. + +--- + +## Phase 1 — Analyze the diff (always runs) + +``` +git fetch origin --quiet +git diff --stat origin/...HEAD +git diff origin/...HEAD +git log origin/..HEAD --format='%s%n%b' +``` + +Draft the description with these sections: + +- **Summary** — 1-3 sentences on what changed and why. +- **Motivation / context** — the problem or request behind it. +- **Changes** — bulleted, grouped by area (UI, API, CLI, tests, docs…). +- **Test plan** — what you ran (`pnpm typecheck`, `pnpm lint`, `pnpm test:web`, + `pnpm test:e2e`) and the result, plus manual steps if any. +- **Risks / rollback** — the blast radius and how to undo. Call out anything + reviewers should scrutinize (data migrations, auth/permissions, external API + or cost impact, breaking changes, shared components touched). State how to + revert (usually "revert this PR" — but note it if a migration or deploy step + makes rollback non-trivial). If the change is low-risk and self-contained, say + so in one line rather than padding. +- **Screenshots** — filled in by Phase 4 if applicable, else omitted. +- **Related Issues** — link tickets and related work. Use `Closes #123` for + issues this PR resolves (auto-closes them on merge), `Refs #456` for related + PRs/issues, plus any relevant docs. Omit the section entirely if there's + nothing to link — don't invent issue numbers. + +### Worked example + +```md +## Summary +Tightens the site header on mobile so the nav no longer wraps under 380px. + +## Motivation / context +The logo + nav links overflowed on small screens, pushing the theme toggle +off-canvas. Reported in #41. + +## Changes +- **UI:** right-align nav links and shrink logo on `sm` breakpoint (`Header.tsx`) +- **UI:** swap hover underline for opacity to avoid layout shift +- **Tests:** add an e2e assertion that the header is visible at 360px width + +## Test plan +- `pnpm typecheck` ✅ · `pnpm lint` ✅ · `pnpm test:e2e` ✅ (3 passed) +- Manually checked /about at 320 / 375 / 768px. + +## Risks / rollback +Low-risk, CSS-only and self-contained. Revert this PR to undo. + +## Screenshots +(before/after table inserted by Phase 4) + +## Related Issues +Closes #41 +``` + +## Phase 2 — Does the screenshot step apply? + +Visual-change heuristic — TRUE if any changed file matches: +- `src/app/**/(page|layout|template).tsx` +- `src/components/**/*.tsx` +- `**/*.css` + +If FALSE → skip to Phase 5 (text-only). If TRUE → continue, subject to the +default scope and give-up policy below. + +### Default screenshot scope (practicality gate) + +Screenshots only pay off on **static, secret-free routes**. Data-driven routes +(homepage, `/analysis/*`, `/pairs`, `/latest`, `/model/*`, etc.) render +empty/mock against the secret-free dev env — an uninformative shot — and cost a +slow capture. So **by default, only capture known-static routes**: + +- **Default static allowlist:** `/about`, `/what-is-an-eval`. (Extend this list + as more static pages are confirmed safe + stable.) +- Any mapped route **not** on the allowlist is **skipped by default** with a note: + _"skipped: data-driven route (pass `--routes` to force)"_. +- The user can **override** with `--routes /foo,/bar` to force specific routes + (e.g. against a preview deploy with real data, where they accept the tradeoff). +- The **security denylist always wins** over any override — `/admin*`, `/api*`, + and auth routes are never captured even if explicitly passed. + +If, after this gate, there are **no routes left to shoot** → skip to Phase 5 +(text-only) with a one-line note. Don't boot servers for nothing. + +## Phase 3 — Capture screenshots (time-boxed, fail-soft) + +> **GIVE-UP POLICY — bail to text-only (Phase 5) and add a one-line note if ANY hold:** +> - Total screenshot phase exceeds **~6 minutes** wall-clock. +> - A dev server fails to become ready within **120s**. +> - Zero routes can be mapped (e.g. only shared components / dynamic routes changed +> and the user gave no route to shoot). +> - The screenshot script captures nothing (`scripts/pr-screenshots.mjs` exits non-zero). +> - Any unexpected error. Never let screenshots block the description. + +**3a. Determine routes** (cap at the 3 most relevant; note any you dropped). +Apply these filters **in order**: +1. Map changed `src/app/**/page.tsx` to URLs (see `e2e-pr` for the mapping rules). +2. **Security denylist (always, non-overridable):** drop any `/admin*`, `/api*`, + or authenticated route; note as "omitted for safety". +3. **Default static gate (see Phase 2):** unless the user passed `--routes`, drop + anything not on the static allowlist; note as "skipped: data-driven route". + If `--routes` was passed, use exactly those (still subject to step 2). +4. Skip dynamic (`[id]`) routes unless the user supplies a concrete URL. +5. Shared-component-only change with nothing left → ask the user for 1-2 + representative static routes, or skip with a note. Don't guess across the app. +- Confirm the dev server has **no real storage/API secrets** in its env before + capturing (guardrail #2). If you can't confirm that, skip screenshots. +- If no routes survive the filters → skip to Phase 5 (text-only). + +Let `SLUG` = sanitized branch name, `ROUTES` = comma-separated list, e.g. `/about,/what-is-an-eval`. + +**3b. Capture AFTER (current branch, HEAD).** Reuse a running dev server on +`:3172` if present, else the config/`pnpm dev` will serve it. Then: +``` +node scripts/pr-screenshots.mjs --base-url http://localhost:3172 \ + --routes "$ROUTES" --out .github/pr-media/$SLUG --label after +``` + +**3c. Capture BEFORE (base branch) in an isolated worktree** so the working tree +is untouched. Symlink `node_modules` to avoid a slow reinstall (valid as long as +the PR didn't change dependencies — if it did, note that the "before" shot may +be approximate): +``` +WT=$(mktemp -d) +git worktree add --detach "$WT" origin/ +ln -s "$PWD/node_modules" "$WT/node_modules" +( cd "$WT" && pnpm exec next dev -p 3173 ) & # remember the PID +# poll http://localhost:3173/about until it responds (cap 120s) +node scripts/pr-screenshots.mjs --base-url http://localhost:3173 \ + --routes "$ROUTES" --out .github/pr-media/$SLUG --label before +# then ALWAYS clean up: +kill ; git worktree remove --force "$WT" +``` + +If BEFORE fails but AFTER succeeded, proceed with after-only + a note. + +## Phase 4 — Host & embed + +> **Before any commit: re-confirm the captures are non-sensitive (guardrails +> #1–#4) and show them to the user for a quick look (guardrail #5).** Committing +> to a public branch is permanent and irreversible. If there is any doubt about +> the contents, use CI-artifact hosting instead (see "Sensitive captures" below) +> or skip embedding entirely. + +GitHub renders images only from URLs, so the PNGs must be committed and pushed +before they resolve. For plainly non-sensitive, static UI, commit them to the PR +branch and reference raw URLs: + +``` +git add .github/pr-media/$SLUG +git commit -m "Add PR before/after screenshots" +git push +``` + +Derive `OWNER/REPO` from `git remote get-url origin` and `BRANCH` from +`git rev-parse --abbrev-ref HEAD`. For each route build a row: + +```md +### Screenshots + +#### `/about` +| Before | After | +|--------|-------| +| ![before](https://raw.githubusercontent.com/OWNER/REPO/BRANCH/.github/pr-media/SLUG/about-before.png) | ![after](https://raw.githubusercontent.com/OWNER/REPO/BRANCH/.github/pr-media/SLUG/about-after.png) | +``` + +(Omit the "Before" cell for routes where only the after shot exists — new pages.) + +> **Tradeoff:** this commits PNGs into the PR branch and they show in the diff. +> On a public repo this is **permanent and world-readable** — do NOT claim they +> can be "deleted before merge" to undo exposure. To keep them out of the PR's +> own diff you can use a dedicated orphan `pr-media` branch, but that is still +> public; it changes visibility-in-diff, not exposure. + +**Sensitive captures → don't commit; use CI artifacts instead.** If a shot could +contain anything non-public, skip the commit and have the e2e workflow upload the +images via `actions/upload-artifact` (collaborator-only, auto-expiring). Link the +run/artifact from the PR body rather than embedding a public raw URL. + +## Phase 5 — Finalize + +- Assemble the full body (Phase 1 sections + Phase 4 screenshots if any). +- If a PR exists → update its body via GitHub MCP. Else → print the markdown. +- If screenshots were skipped, include one honest line, e.g. + _"Screenshots skipped: change is API-only"_ or _"…: dev server didn't boot in time"_. + Never silently drop them without saying why. + +## Args + +- Base branch to diff against (default `origin/main`). +- `--routes /foo,/bar` — **override the default static-only gate** and capture + exactly these routes (still subject to the security denylist). Use this for + shared-component changes or when shooting a preview deploy with real data. + +By default (no `--routes`), only known-static routes (`/about`, +`/what-is-an-eval`) are captured; data-driven routes are skipped with a note. + +Example: `/pr-describe` · `/pr-describe staging` · `/pr-describe --routes /pairs,/latest` diff --git a/.github/workflows/e2e.yml b/.github/workflows/e2e.yml new file mode 100644 index 00000000..ab4669cb --- /dev/null +++ b/.github/workflows/e2e.yml @@ -0,0 +1,49 @@ +name: E2E Tests + +on: + pull_request: + paths-ignore: + - '**/*.md' + - 'docs/**' + push: + branches: [main] + paths-ignore: + - '**/*.md' + - 'docs/**' + +concurrency: + group: e2e-${{ github.ref }} + cancel-in-progress: true + +jobs: + e2e: + runs-on: ubuntu-latest + timeout-minutes: 20 + steps: + - uses: actions/checkout@v4 + + - uses: pnpm/action-setup@v4 + + - uses: actions/setup-node@v4 + with: + node-version: 18 + cache: pnpm + + - name: Install dependencies + run: pnpm install --frozen-lockfile + + - name: Install Playwright Chromium + run: pnpm exec playwright install --with-deps chromium + + - name: Run E2E tests + run: pnpm test:e2e + env: + CI: 'true' + + - name: Upload Playwright report + if: ${{ !cancelled() }} + uses: actions/upload-artifact@v4 + with: + name: playwright-report + path: playwright-report/ + retention-days: 14 diff --git a/.gitignore b/.gitignore index aecac04c..0c90d529 100644 --- a/.gitignore +++ b/.gitignore @@ -16,7 +16,8 @@ catechism_dump.txt **/node_modules/** -.claude +.claude/* +!.claude/skills # testing /coverage @@ -24,6 +25,12 @@ catechism_dump.txt .swc .auth +# playwright +/test-results/ +/playwright-report/ +/blob-report/ +/playwright/.cache/ + # next.js /.next/ /out/ diff --git a/package.json b/package.json index c30e12b1..d0206fd3 100644 --- a/package.json +++ b/package.json @@ -20,6 +20,10 @@ "test": "pnpm test:web && pnpm test:cli", "test:web": "jest --config jest.config.js", "test:cli": "node --experimental-vm-modules node_modules/jest/bin/jest.js --config jest.config.cli.js", + "test:e2e": "playwright test", + "test:e2e:ui": "playwright test --ui", + "test:e2e:report": "playwright show-report", + "pr:screenshots": "node scripts/pr-screenshots.mjs", "test:personality": "tsx src/cli/personality-test.ts", "compare:personality": "tsx src/cli/compare-personality.ts", "test:preference": "tsx src/cli/preference-test.ts", @@ -129,6 +133,7 @@ "devDependencies": { "@jest/globals": "^30.0.5", "@next/bundle-analyzer": "^15.4.4", + "@playwright/test": "1.55.0", "@sentry/cli": "^2.57.0", "@tailwindcss/postcss": "^4.1.11", "@tailwindcss/typography": "^0.5.16", diff --git a/playwright.config.ts b/playwright.config.ts new file mode 100644 index 00000000..dfa89006 --- /dev/null +++ b/playwright.config.ts @@ -0,0 +1,56 @@ +import { defineConfig, devices } from '@playwright/test'; +import { existsSync } from 'node:fs'; + +/** + * In the hosted agent sandbox, Chromium is pre-installed at this path and + * `playwright install` is disabled. When the binary is present we point + * Playwright at it directly; otherwise (local dev, CI) we let Playwright use + * its own bundled browser installed via `playwright install chromium`. + */ +const SANDBOX_CHROMIUM = '/opt/pw-browsers/chromium'; +const executablePath = existsSync(SANDBOX_CHROMIUM) ? SANDBOX_CHROMIUM : undefined; + +const PORT = 3172; +const BASE_URL = process.env.E2E_BASE_URL ?? `http://localhost:${PORT}`; + +export default defineConfig({ + testDir: './tests/e2e', + fullyParallel: true, + forbidOnly: !!process.env.CI, + retries: process.env.CI ? 2 : 0, + workers: process.env.CI ? 1 : undefined, + reporter: process.env.CI + ? [['list'], ['html', { open: 'never' }]] + : [['list']], + timeout: 60_000, + expect: { timeout: 15_000 }, + use: { + baseURL: BASE_URL, + navigationTimeout: 45_000, + trace: 'on-first-retry', + screenshot: 'only-on-failure', + video: 'retain-on-failure', + }, + projects: [ + { + name: 'chromium', + use: { ...devices['Desktop Chrome'], launchOptions: { executablePath } }, + }, + ], + /** + * When E2E_BASE_URL is set we assume the app is already running (e.g. a + * production build or a remote deploy) and skip booting a dev server. + * Otherwise boot `pnpm dev`; the readiness probe hits /about — a static, + * dependency-free route — which also pre-compiles it so the first test is fast. + */ + webServer: process.env.E2E_BASE_URL + ? undefined + : { + command: 'pnpm dev', + url: `http://localhost:${PORT}/about`, + reuseExistingServer: !process.env.CI, + timeout: 180_000, + stdout: 'pipe', + stderr: 'pipe', + }, +}); diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index 2794962b..15eb0e19 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -82,7 +82,7 @@ importers: version: 1.2.7(@types/react-dom@19.1.6(@types/react@19.1.8))(@types/react@19.1.8)(react-dom@19.1.0(react@19.1.0))(react@19.1.0) '@sentry/nextjs': specifier: ^10.21.0 - version: 10.21.0(@opentelemetry/context-async-hooks@2.2.0(@opentelemetry/api@1.9.0))(@opentelemetry/core@2.2.0(@opentelemetry/api@1.9.0))(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(encoding@0.1.13)(next@15.5.9(@babel/core@7.27.4)(@opentelemetry/api@1.9.0)(react-dom@19.1.0(react@19.1.0))(react@19.1.0))(react@19.1.0)(webpack@5.102.1) + version: 10.21.0(@opentelemetry/context-async-hooks@2.2.0(@opentelemetry/api@1.9.0))(@opentelemetry/core@2.2.0(@opentelemetry/api@1.9.0))(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(encoding@0.1.13)(next@15.5.9(@babel/core@7.27.4)(@opentelemetry/api@1.9.0)(@playwright/test@1.55.0)(react-dom@19.1.0(react@19.1.0))(react@19.1.0))(react@19.1.0)(webpack@5.102.1) '@sentry/node': specifier: ^10.21.0 version: 10.21.0 @@ -166,7 +166,7 @@ importers: version: 3.1.0 next: specifier: 15.5.9 - version: 15.5.9(@babel/core@7.27.4)(@opentelemetry/api@1.9.0)(react-dom@19.1.0(react@19.1.0))(react@19.1.0) + version: 15.5.9(@babel/core@7.27.4)(@opentelemetry/api@1.9.0)(@playwright/test@1.55.0)(react-dom@19.1.0(react@19.1.0))(react@19.1.0) next-themes: specifier: ^0.4.6 version: 0.4.6(react-dom@19.1.0(react@19.1.0))(react@19.1.0) @@ -237,6 +237,9 @@ importers: '@next/bundle-analyzer': specifier: ^15.4.4 version: 15.4.4 + '@playwright/test': + specifier: 1.55.0 + version: 1.55.0 '@sentry/cli': specifier: ^2.57.0 version: 2.57.0(encoding@0.1.13) @@ -1825,6 +1828,11 @@ packages: resolution: {integrity: sha512-YLT9Zo3oNPJoBjBc4q8G2mjU4tqIbf5CEOORbUUr48dCD9q3umJ3IPlVqOqDakPfd2HuwccBaqlGhN4Gmr5OWg==} engines: {node: ^12.20.0 || ^14.18.0 || >=16.0.0} + '@playwright/test@1.55.0': + resolution: {integrity: sha512-04IXzPwHrW69XusN/SIdDdKZBzMfOT9UNT/YiJit/xpy2VuAoB8NHc8Aplb96zsWDddLnbkPL3TsmrS04ZU2xQ==} + engines: {node: '>=18'} + hasBin: true + '@polka/url@1.0.0-next.29': resolution: {integrity: sha512-wwQAWhWSuHaag8c4q/KN/vCoeOJYshAIvMQwD4GpSb3OiZklFfvAgmj0VCBBImRpuF/aFgIRzllXlVX93Jevww==} @@ -4558,6 +4566,11 @@ packages: fs.realpath@1.0.0: resolution: {integrity: sha512-OO0pH2lK6a0hZnAdau5ItzHPI6pUlvI7jMVnxUQRtw4owF2wk8lOSabtGDCTP4Ggrg2MbGnWO9X8K1t4+fGMDw==} + fsevents@2.3.2: + resolution: {integrity: sha512-xiqMQR4xAeHTuB9uWm+fFRcIOgKBMiOBP+eXiyT7jsgVCq1bkVygt00oASowB7EdtpOHaaPgKt812P9ab+DDKA==} + engines: {node: ^8.16.0 || ^10.6.0 || >=11.0.0} + os: [darwin] + fsevents@2.3.3: resolution: {integrity: sha512-5xoDfX+fL7faATnagmWPpbFtwh/R77WmMMqqHGS65C3vvB0YHrgF+B1YmZ3441tMj5n63k0212XNoJwzlhffQw==} engines: {node: ^8.16.0 || ^10.6.0 || >=11.0.0} @@ -5885,6 +5898,16 @@ packages: resolution: {integrity: sha512-HRDzbaKjC+AOWVXxAU/x54COGeIv9eb+6CkDSQoNTt4XyWoIJvuPsXizxu/Fr23EiekbtZwmh1IcIG/l/a10GQ==} engines: {node: '>=8'} + playwright-core@1.55.0: + resolution: {integrity: sha512-GvZs4vU3U5ro2nZpeiwyb0zuFaqb9sUiAJuyrWpcGouD8y9/HLgGbNRjIph7zU9D3hnPaisMl9zG9CgFi/biIg==} + engines: {node: '>=18'} + hasBin: true + + playwright@1.55.0: + resolution: {integrity: sha512-sdCWStblvV1YU909Xqx0DhOjPZE4/5lJsIS84IfN9dAZfcl/CIZ5O8l3o0j7hPMjDvqoTF8ZUcc+i/GL5erstA==} + engines: {node: '>=18'} + hasBin: true + postcss-import@15.1.0: resolution: {integrity: sha512-hpr+J05B2FVYUAXHeK1YyI267J/dDDhMU6B6civm8hSY1jYJnBXxzKDKDswzJmtLHryrjhnDjqqp/49t8FALew==} engines: {node: '>=14.0.0'} @@ -9217,6 +9240,10 @@ snapshots: '@pkgr/core@0.2.7': {} + '@playwright/test@1.55.0': + dependencies: + playwright: 1.55.0 + '@polka/url@1.0.0-next.29': {} '@prisma/instrumentation@6.15.0(@opentelemetry/api@1.9.0)': @@ -9997,7 +10024,7 @@ snapshots: '@sentry/core@10.21.0': {} - '@sentry/nextjs@10.21.0(@opentelemetry/context-async-hooks@2.2.0(@opentelemetry/api@1.9.0))(@opentelemetry/core@2.2.0(@opentelemetry/api@1.9.0))(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(encoding@0.1.13)(next@15.5.9(@babel/core@7.27.4)(@opentelemetry/api@1.9.0)(react-dom@19.1.0(react@19.1.0))(react@19.1.0))(react@19.1.0)(webpack@5.102.1)': + '@sentry/nextjs@10.21.0(@opentelemetry/context-async-hooks@2.2.0(@opentelemetry/api@1.9.0))(@opentelemetry/core@2.2.0(@opentelemetry/api@1.9.0))(@opentelemetry/sdk-trace-base@2.2.0(@opentelemetry/api@1.9.0))(encoding@0.1.13)(next@15.5.9(@babel/core@7.27.4)(@opentelemetry/api@1.9.0)(@playwright/test@1.55.0)(react-dom@19.1.0(react@19.1.0))(react@19.1.0))(react@19.1.0)(webpack@5.102.1)': dependencies: '@opentelemetry/api': 1.9.0 '@opentelemetry/semantic-conventions': 1.37.0 @@ -10011,7 +10038,7 @@ snapshots: '@sentry/vercel-edge': 10.21.0 '@sentry/webpack-plugin': 4.5.0(encoding@0.1.13)(webpack@5.102.1) chalk: 3.0.0 - next: 15.5.9(@babel/core@7.27.4)(@opentelemetry/api@1.9.0)(react-dom@19.1.0(react@19.1.0))(react@19.1.0) + next: 15.5.9(@babel/core@7.27.4)(@opentelemetry/api@1.9.0)(@playwright/test@1.55.0)(react-dom@19.1.0(react@19.1.0))(react@19.1.0) resolve: 1.22.8 rollup: 4.44.2 stacktrace-parser: 0.1.11 @@ -12331,6 +12358,9 @@ snapshots: fs.realpath@1.0.0: {} + fsevents@2.3.2: + optional: true + fsevents@2.3.3: optional: true @@ -13818,7 +13848,7 @@ snapshots: react: 19.1.0 react-dom: 19.1.0(react@19.1.0) - next@15.5.9(@babel/core@7.27.4)(@opentelemetry/api@1.9.0)(react-dom@19.1.0(react@19.1.0))(react@19.1.0): + next@15.5.9(@babel/core@7.27.4)(@opentelemetry/api@1.9.0)(@playwright/test@1.55.0)(react-dom@19.1.0(react@19.1.0))(react@19.1.0): dependencies: '@next/env': 15.5.9 '@swc/helpers': 0.5.15 @@ -13837,6 +13867,7 @@ snapshots: '@next/swc-win32-arm64-msvc': 15.5.7 '@next/swc-win32-x64-msvc': 15.5.7 '@opentelemetry/api': 1.9.0 + '@playwright/test': 1.55.0 sharp: 0.34.3 transitivePeerDependencies: - '@babel/core' @@ -14068,6 +14099,14 @@ snapshots: dependencies: find-up: 4.1.0 + playwright-core@1.55.0: {} + + playwright@1.55.0: + dependencies: + playwright-core: 1.55.0 + optionalDependencies: + fsevents: 2.3.2 + postcss-import@15.1.0(postcss@8.5.6): dependencies: postcss: 8.5.6 diff --git a/scripts/pr-screenshots.mjs b/scripts/pr-screenshots.mjs new file mode 100644 index 00000000..8688a48b --- /dev/null +++ b/scripts/pr-screenshots.mjs @@ -0,0 +1,80 @@ +// Capture full-page screenshots of routes, for PR before/after comparisons. +// +// Usage: +// node scripts/pr-screenshots.mjs \ +// --base-url http://localhost:3172 \ +// --routes /about,/what-is-an-eval \ +// --out .github/pr-media/my-branch \ +// --label after +// +// Designed to FAIL SOFT: a route that errors or times out is skipped (and +// logged), not fatal, so the orchestrating skill can still produce a partial +// result. Exits non-zero only if NOTHING was captured. +// +// Reuses the @playwright/test Chromium (no extra dependency). In the hosted +// sandbox it launches the pre-installed browser at /opt/pw-browsers/chromium. + +import { chromium } from '@playwright/test'; +import { existsSync, mkdirSync } from 'node:fs'; +import path from 'node:path'; + +function arg(name, fallback) { + const i = process.argv.indexOf(`--${name}`); + return i !== -1 && process.argv[i + 1] ? process.argv[i + 1] : fallback; +} + +const baseUrl = arg('base-url', 'http://localhost:3172').replace(/\/$/, ''); +const routes = arg('routes', '/') + .split(',') + .map((r) => r.trim()) + .filter(Boolean); +const outDir = arg('out', '.github/pr-media'); +const label = arg('label', 'after'); +const viewport = { + width: Number(arg('width', '1280')), + height: Number(arg('height', '800')), +}; +const perRouteTimeoutMs = Number(arg('timeout', '45000')); + +const SANDBOX_CHROMIUM = '/opt/pw-browsers/chromium'; +const executablePath = existsSync(SANDBOX_CHROMIUM) ? SANDBOX_CHROMIUM : undefined; + +function slug(route) { + const s = route.replace(/^\/+|\/+$/g, '').replace(/[^a-zA-Z0-9._-]+/g, '_'); + return s || 'home'; +} + +mkdirSync(outDir, { recursive: true }); + +const browser = await chromium.launch({ executablePath }); +const context = await browser.newContext({ viewport }); +const results = []; + +for (const route of routes) { + const page = await context.newPage(); + const file = path.join(outDir, `${slug(route)}-${label}.png`); + try { + // 'load' (not 'networkidle') — Next.js dev keeps an HMR websocket open, + // so networkidle would never settle. + await page.goto(`${baseUrl}${route}`, { + waitUntil: 'load', + timeout: perRouteTimeoutMs, + }); + await page.waitForTimeout(750); // let fonts/animations settle + await page.screenshot({ path: file, fullPage: true }); + results.push({ route, file, ok: true }); + console.log(`OK ${route} -> ${file}`); + } catch (err) { + results.push({ route, ok: false, error: String(err?.message || err) }); + console.log(`FAIL ${route}: ${err?.message || err}`); + } finally { + await page.close(); + } +} + +await context.close(); +await browser.close(); + +const ok = results.filter((r) => r.ok).length; +console.log(`\n${ok}/${routes.length} screenshots captured in ${outDir}`); +process.exit(ok > 0 ? 0 : 1); diff --git a/tests/e2e/README.md b/tests/e2e/README.md new file mode 100644 index 00000000..bd7c1179 --- /dev/null +++ b/tests/e2e/README.md @@ -0,0 +1,51 @@ +# End-to-end tests + +Playwright e2e tests for the Weval web app. + +## Running + +```bash +pnpm test:e2e # run the suite (auto-boots `pnpm dev` on :3172) +pnpm test:e2e:ui # interactive UI mode +pnpm test:e2e:report # open the last HTML report +``` + +You don't need to start the dev server yourself — `playwright.config.ts` has a +`webServer` block that boots `pnpm dev` and waits for `/about` to respond. If a +dev server is already running on `:3172` it is reused (locally). + +To run against an already-running app (e.g. a production build or a deployed +preview) instead of booting dev: + +```bash +E2E_BASE_URL=https://your-preview.example.com pnpm test:e2e +``` + +## Browser binary + +- **Local / CI:** Playwright uses its own bundled Chromium. Install it once with + `pnpm exec playwright install --with-deps chromium`. +- **Hosted agent sandbox:** Chromium is pre-installed at `/opt/pw-browsers`. The + config auto-detects it (`executablePath`) and never downloads. + +## What's safe to test here + +Smoke tests deliberately target **statically rendered, dependency-free routes** +(`/about`, `/what-is-an-eval`, …) so they pass in CI without any secrets. + +Routes that read from storage (S3) or call external LLM APIs — the homepage, +`/analysis/*`, `/latest`, etc. — will be slow or error without env/network. To +cover those, either: + +- provide the relevant env vars (see `.env.template`), or +- intercept network calls with `page.route(...)` and serve fixtures. + +Keep flaky, data-dependent assertions out of the default suite. + +## Conventions + +- Prefer role/text locators (`getByRole`, `getByText`) and `a[href*="…"]` over + brittle CSS/nth-child selectors. +- Never use hard `waitForTimeout` for synchronization — rely on web-first + assertions (`await expect(locator).toBeVisible()`), which auto-wait. +- Add a `data-testid` to a component only when no accessible/role selector works. diff --git a/tests/e2e/smoke.spec.ts b/tests/e2e/smoke.spec.ts new file mode 100644 index 00000000..10da7e08 --- /dev/null +++ b/tests/e2e/smoke.spec.ts @@ -0,0 +1,25 @@ +import { test, expect } from '@playwright/test'; + +/** + * Smoke tests target dependency-free, statically rendered routes so they pass + * in CI without storage/API secrets. Data-driven routes (the homepage, + * /analysis, etc.) hit storage and external LLM APIs — to test those, provide + * env vars or mock the network first. See tests/e2e/README.md. + */ +test.describe('smoke', () => { + test('about page renders its title and key content', async ({ page }) => { + await page.goto('/about'); + + await expect(page).toHaveTitle(/About Weval/i); + await expect( + page.getByRole('heading', { name: /what are evaluations\?/i }), + ).toBeVisible(); + }); + + test('about page links out to the Collective Intelligence Project', async ({ page }) => { + await page.goto('/about'); + + const cipLink = page.locator('a[href*="cip.org"]').first(); + await expect(cipLink).toBeVisible(); + }); +});