Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions .claude/skills/e2e-pr/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
name: e2e-pr
description: Run the Playwright end-to-end suite against the routes a PR changes and report results. Use when verifying UI behavior for a branch or PR, or checking that a change didn't break key pages.
---

# e2e-pr

Run the Playwright end-to-end suite against the routes a PR touches, and report results.

## Prerequisites

- `@playwright/test` is installed and `playwright.config.ts` exists at the repo root.
- The config auto-boots `pnpm dev` (its `webServer` block) and waits for `/about`,
so you do NOT need to start a server yourself unless one is already running.
- In CI the browser comes from `playwright install`; in the hosted sandbox it's
the pre-installed Chromium at `/opt/pw-browsers`. The config handles both.

## Steps

1. **Find changed files** vs the base branch (default `origin/main`):
```
git diff --name-only origin/main...HEAD
```

2. **Map changed files to affected routes** (best-effort):
- `src/app/(group)/foo/page.tsx` → `/foo` (drop `src/app`, drop `(route-group)`
segments, drop the trailing `/page.tsx`)
- `src/app/(standard)/page.tsx` → `/`
- `layout.tsx` / `template.tsx` changes affect every route beneath them.
- Dynamic segments (`[id]`) can't be visited without a concrete value — note
them but don't try to test them blindly.
- Changes under `src/components/**` are shared and can't be mapped to a single
route → treat as "broad" (run the whole suite).

3. **Choose scope:**
- If specific route files changed and there are specs covering them, run those:
```
pnpm test:e2e -g "<route or feature keyword>"
```
or pass specific spec files: `pnpm test:e2e tests/e2e/<file>.spec.ts`
- If changes are broad (shared components, layout, config) OR the suite is
small, just run everything: `pnpm test:e2e`

4. **Run** the chosen command. The first run cold-compiles routes in `next dev`,
so allow time — the config already uses generous timeouts.

5. **Report:**
- Pass/fail, number of tests run, and any failures with their error message.
- On failure, point to the HTML report (`pnpm test:e2e:report`) and the trace
(saved under `test-results/` on first retry).
- Call out any changed routes that have NO e2e coverage as gaps (not failures),
so the author can decide whether to add a spec.

## Notes

- Data-dependent routes (homepage, `/analysis/*`) need env/secrets or network
mocking — see `tests/e2e/README.md`. Don't add flaky assertions on them.
- Do NOT auto-write new specs here; that's authoring. This skill runs existing
tests. (Use `pr-describe` or a dedicated authoring step to add coverage.)

## Args

Optional base branch to diff against (default `origin/main`).

Example: `/e2e-pr` or `/e2e-pr staging`
57 changes: 57 additions & 0 deletions .claude/skills/pr-check/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
---
name: pr-check
description: Run the full quality gate on the current branch — TypeScript typecheck, ESLint, web/CLI Jest tests, and blueprint validation — then summarize pass/fail. Use before opening a PR, or to check whether a branch is CI-ready.
---

# pr-check

Run a full quality gate on the current branch before or after a PR is created. Orchestrates type checking, linting, and tests, then produces a structured summary.

## Steps

1. **Identify scope** — get the diff summary:
```
git diff --stat origin/main...HEAD
```
Note whether changes touch: source code, blueprint files, tests, docs, config.

2. **Run checks in parallel where possible** (report results as each finishes):

| Check | Command | When to run |
|-------|---------|-------------|
| TypeScript | `pnpm typecheck` | Always |
| Lint | `pnpm lint` | Always |
| Web tests | `pnpm test:web` | If `src/app/`, `src/components/`, `src/hooks/`, or `src/point-functions/` changed |
| CLI tests | `pnpm test:cli` | If `src/cli/` or `src/lib/` changed |
| Blueprint validate | see `blueprint-validate` skill | If any `.yaml`/`.yml`/`.json` blueprint files changed |

3. **Collect results** — for each check record:
- Status: ✅ pass / ❌ fail / ⚠️ warnings / ⏭️ skipped (not applicable)
- Error/warning count
- Key details (first 5 errors max per check to keep output readable)

4. **Produce a summary table:**
```
## PR Check Results — <branch-name>

| Check | Status | Details |
|---------------|--------|--------------------------|
| TypeScript | ✅ | 0 errors |
| Lint | ⚠️ | 2 warnings, 0 errors |
| Web tests | ✅ | 47 passed |
| CLI tests | ⏭️ | No CLI files changed |
| Blueprints | ✅ | 1 file validated |

**Overall: READY TO MERGE** / **NEEDS FIXES**
```

5. If any check fails, list the specific errors that need to be addressed.

6. If `--comment` is passed as an arg, post this summary as a GitHub PR comment using the available GitHub MCP tools.

## Args

- `--comment`: Post the results as a GitHub PR comment (requires PR number to be detectable from the branch)
- Base branch (e.g. `main`, `staging`): defaults to `origin/main`

Example: `/pr-check` or `/pr-check --comment` or `/pr-check staging`
266 changes: 266 additions & 0 deletions .claude/skills/pr-describe/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,266 @@
---
name: pr-describe
description: Generate a structured PR description from the branch diff (Summary, Changes, Test plan, Risks, Related Issues), with optional static-gated before/after screenshots for visual changes. Use when writing or updating a pull request description.
---

# pr-describe

Generate a high-quality PR description from the branch diff. The description
itself is the core deliverable and always runs. Before/after screenshots are a
**bonus that only activates on static, visual PRs** — they're strictly
time-boxed, **fail soft** (any problem → clean text-only description), and are
**off by default for data-driven routes** (see "Default screenshot scope").

## Output contract

- If a PR already exists for the current branch → update its **title and body**
(GitHub MCP `update_pull_request`). Only change the title if it doesn't already
follow the convention below.
- If no PR exists → print the finished **title + markdown body** for the user.
Do **NOT** open a PR unless the user explicitly asked for one.

## PR title convention

Produce a properly tagged title in **Conventional Commits** form:

```
type(scope): imperative summary
```

- **type** — infer from the dominant change: `feat` (new capability), `fix`
(bug fix), `docs`, `test` (tests/test infra), `ci` (CI/workflows), `refactor`,
`perf`, `chore`, `build`, `style`.
- **scope** — optional, a short area derived from the diff (e.g. `e2e`, `auth`,
`header`, `cli`, `pr-eval`). Omit if it spans many areas.
- **summary** — imperative mood, lower-case start, **no trailing period**, aim
for ≤ ~70 chars total.
- If the change mixes types, pick the one that best describes the user-facing
intent (a feature with its tests is still `feat`).

Examples: `feat(header): collapse nav into a menu under 380px` ·
`test(e2e): add Playwright smoke suite` · `fix(cli): handle empty blueprint id`.

## Security guardrails (NON-NEGOTIABLE — this repo is public)

Screenshots publish whatever they render. On a **public** repo, anything you
commit is world-readable via raw URLs **permanently** (git history, forks, CDN
cache) — deleting the branch does NOT undo it. So:

1. **Route denylist — never screenshot these, no exceptions:**
`/admin*`, `/api*`, and any authenticated/session/account route. If a changed
route matches, skip it and note "omitted for safety", do not capture it.
2. **Capture only against a secret-free environment.** Use the local `pnpm dev`
with NO real storage/API secrets wired up, so data-driven pages render
empty/mock and there is nothing sensitive in frame. Do **not** screenshot a
preview deploy that is backed by real data/secrets and then commit it here.
3. **Treat committed images as permanent and public.** Never tell the user they
can "delete before merge" to undo exposure — that is false.
4. **If a capture could contain anything sensitive, do NOT commit it to the
branch.** Prefer CI-artifact hosting (collaborator-only, auto-expiring) or a
PR comment. Committing to the public branch is only for plainly non-sensitive,
static UI.
5. **Surface images for human review before pushing.** Show the user what was
captured and let them confirm; never push blind.

---

## Phase 1 — Analyze the diff (always runs)

```
git fetch origin <base> --quiet
git diff --stat origin/<base>...HEAD
git diff origin/<base>...HEAD
git log origin/<base>..HEAD --format='%s%n%b'
```

Draft the description with these sections:

- **Summary** — 1-3 sentences on what changed and why.
- **Motivation / context** — the problem or request behind it.
- **Changes** — bulleted, grouped by area (UI, API, CLI, tests, docs…).
- **Test plan** — what you ran (`pnpm typecheck`, `pnpm lint`, `pnpm test:web`,
`pnpm test:e2e`) and the result, plus manual steps if any.
- **Risks / rollback** — the blast radius and how to undo. Call out anything
reviewers should scrutinize (data migrations, auth/permissions, external API
or cost impact, breaking changes, shared components touched). State how to
revert (usually "revert this PR" — but note it if a migration or deploy step
makes rollback non-trivial). If the change is low-risk and self-contained, say
so in one line rather than padding.
- **Screenshots** — filled in by Phase 4 if applicable, else omitted.
- **Related Issues** — link tickets and related work. Use `Closes #123` for
issues this PR resolves (auto-closes them on merge), `Refs #456` for related
PRs/issues, plus any relevant docs. Omit the section entirely if there's
nothing to link — don't invent issue numbers.

### Worked example

```md
## Summary
Tightens the site header on mobile so the nav no longer wraps under 380px.

## Motivation / context
The logo + nav links overflowed on small screens, pushing the theme toggle
off-canvas. Reported in #41.

## Changes
- **UI:** right-align nav links and shrink logo on `sm` breakpoint (`Header.tsx`)
- **UI:** swap hover underline for opacity to avoid layout shift
- **Tests:** add an e2e assertion that the header is visible at 360px width

## Test plan
- `pnpm typecheck` ✅ · `pnpm lint` ✅ · `pnpm test:e2e` ✅ (3 passed)
- Manually checked /about at 320 / 375 / 768px.

## Risks / rollback
Low-risk, CSS-only and self-contained. Revert this PR to undo.

## Screenshots
(before/after table inserted by Phase 4)

## Related Issues
Closes #41
```

## Phase 2 — Does the screenshot step apply?

Visual-change heuristic — TRUE if any changed file matches:
- `src/app/**/(page|layout|template).tsx`
- `src/components/**/*.tsx`
- `**/*.css`

If FALSE → skip to Phase 5 (text-only). If TRUE → continue, subject to the
default scope and give-up policy below.

### Default screenshot scope (practicality gate)

Screenshots only pay off on **static, secret-free routes**. Data-driven routes
(homepage, `/analysis/*`, `/pairs`, `/latest`, `/model/*`, etc.) render
empty/mock against the secret-free dev env — an uninformative shot — and cost a
slow capture. So **by default, only capture known-static routes**:

- **Default static allowlist:** `/about`, `/what-is-an-eval`. (Extend this list
as more static pages are confirmed safe + stable.)
- Any mapped route **not** on the allowlist is **skipped by default** with a note:
_"skipped: data-driven route (pass `--routes` to force)"_.
- The user can **override** with `--routes /foo,/bar` to force specific routes
(e.g. against a preview deploy with real data, where they accept the tradeoff).
- The **security denylist always wins** over any override — `/admin*`, `/api*`,
and auth routes are never captured even if explicitly passed.

If, after this gate, there are **no routes left to shoot** → skip to Phase 5
(text-only) with a one-line note. Don't boot servers for nothing.

## Phase 3 — Capture screenshots (time-boxed, fail-soft)

> **GIVE-UP POLICY — bail to text-only (Phase 5) and add a one-line note if ANY hold:**
> - Total screenshot phase exceeds **~6 minutes** wall-clock.
> - A dev server fails to become ready within **120s**.
> - Zero routes can be mapped (e.g. only shared components / dynamic routes changed
> and the user gave no route to shoot).
> - The screenshot script captures nothing (`scripts/pr-screenshots.mjs` exits non-zero).
> - Any unexpected error. Never let screenshots block the description.

**3a. Determine routes** (cap at the 3 most relevant; note any you dropped).
Apply these filters **in order**:
1. Map changed `src/app/**/page.tsx` to URLs (see `e2e-pr` for the mapping rules).
2. **Security denylist (always, non-overridable):** drop any `/admin*`, `/api*`,
or authenticated route; note as "omitted for safety".
3. **Default static gate (see Phase 2):** unless the user passed `--routes`, drop
anything not on the static allowlist; note as "skipped: data-driven route".
If `--routes` was passed, use exactly those (still subject to step 2).
4. Skip dynamic (`[id]`) routes unless the user supplies a concrete URL.
5. Shared-component-only change with nothing left → ask the user for 1-2
representative static routes, or skip with a note. Don't guess across the app.
- Confirm the dev server has **no real storage/API secrets** in its env before
capturing (guardrail #2). If you can't confirm that, skip screenshots.
- If no routes survive the filters → skip to Phase 5 (text-only).

Let `SLUG` = sanitized branch name, `ROUTES` = comma-separated list, e.g. `/about,/what-is-an-eval`.

**3b. Capture AFTER (current branch, HEAD).** Reuse a running dev server on
`:3172` if present, else the config/`pnpm dev` will serve it. Then:
```
node scripts/pr-screenshots.mjs --base-url http://localhost:3172 \
--routes "$ROUTES" --out .github/pr-media/$SLUG --label after
```

**3c. Capture BEFORE (base branch) in an isolated worktree** so the working tree
is untouched. Symlink `node_modules` to avoid a slow reinstall (valid as long as
the PR didn't change dependencies — if it did, note that the "before" shot may
be approximate):
```
WT=$(mktemp -d)
git worktree add --detach "$WT" origin/<base>
ln -s "$PWD/node_modules" "$WT/node_modules"
( cd "$WT" && pnpm exec next dev -p 3173 ) & # remember the PID
# poll http://localhost:3173/about until it responds (cap 120s)
node scripts/pr-screenshots.mjs --base-url http://localhost:3173 \
--routes "$ROUTES" --out .github/pr-media/$SLUG --label before
# then ALWAYS clean up:
kill <pid>; git worktree remove --force "$WT"
```

If BEFORE fails but AFTER succeeded, proceed with after-only + a note.

## Phase 4 — Host & embed

> **Before any commit: re-confirm the captures are non-sensitive (guardrails
> #1–#4) and show them to the user for a quick look (guardrail #5).** Committing
> to a public branch is permanent and irreversible. If there is any doubt about
> the contents, use CI-artifact hosting instead (see "Sensitive captures" below)
> or skip embedding entirely.

GitHub renders images only from URLs, so the PNGs must be committed and pushed
before they resolve. For plainly non-sensitive, static UI, commit them to the PR
branch and reference raw URLs:

```
git add .github/pr-media/$SLUG
git commit -m "Add PR before/after screenshots"
git push
```

Derive `OWNER/REPO` from `git remote get-url origin` and `BRANCH` from
`git rev-parse --abbrev-ref HEAD`. For each route build a row:

```md
### Screenshots

#### `/about`
| Before | After |
|--------|-------|
| ![before](https://raw.githubusercontent.com/OWNER/REPO/BRANCH/.github/pr-media/SLUG/about-before.png) | ![after](https://raw.githubusercontent.com/OWNER/REPO/BRANCH/.github/pr-media/SLUG/about-after.png) |
```

(Omit the "Before" cell for routes where only the after shot exists — new pages.)

> **Tradeoff:** this commits PNGs into the PR branch and they show in the diff.
> On a public repo this is **permanent and world-readable** — do NOT claim they
> can be "deleted before merge" to undo exposure. To keep them out of the PR's
> own diff you can use a dedicated orphan `pr-media` branch, but that is still
> public; it changes visibility-in-diff, not exposure.

**Sensitive captures → don't commit; use CI artifacts instead.** If a shot could
contain anything non-public, skip the commit and have the e2e workflow upload the
images via `actions/upload-artifact` (collaborator-only, auto-expiring). Link the
run/artifact from the PR body rather than embedding a public raw URL.

## Phase 5 — Finalize

- Assemble the full body (Phase 1 sections + Phase 4 screenshots if any).
- If a PR exists → update its body via GitHub MCP. Else → print the markdown.
- If screenshots were skipped, include one honest line, e.g.
_"Screenshots skipped: change is API-only"_ or _"…: dev server didn't boot in time"_.
Never silently drop them without saying why.

## Args

- Base branch to diff against (default `origin/main`).
- `--routes /foo,/bar` — **override the default static-only gate** and capture
exactly these routes (still subject to the security denylist). Use this for
shared-component changes or when shooting a preview deploy with real data.

By default (no `--routes`), only known-static routes (`/about`,
`/what-is-an-eval`) are captured; data-driven routes are skipped with a note.

Example: `/pr-describe` · `/pr-describe staging` · `/pr-describe --routes /pairs,/latest`
Loading
Loading