test(e2e): add Playwright foundation and PR testing/description skills by nojibe · Pull Request #30 · weval-org/app

nojibe · 2026-06-30T20:29:28Z

Summary

Stands up a deterministic, version-controlled end-to-end testing layer (@playwright/test) and three Claude Code skills that help test and document changes as PRs are created. The e2e suite is the high-value backbone; the skills drive it.

Motivation / context

There was no e2e layer and no Claude Code skills in the repo. Agentic browser automation alone isn't a substitute for a regression suite, so this adds committed Playwright specs that run in CI, plus skills that orchestrate the existing pnpm checks and author PR descriptions.

Changes

E2E foundation
- playwright.config.ts — boots pnpm dev, pre-warms /about as the readiness probe, retries + trace + video on CI. Auto-detects the sandbox Chromium at /opt/pw-browsers and falls back to the bundled browser locally/in CI.
- tests/e2e/smoke.spec.ts (+ README.md) — smoke tests on /about, a static, secret-free route, so they pass in CI without storage/API keys.
- .github/workflows/e2e.yml — runs the suite on PRs/pushes (safe pull_request trigger), uploads the HTML report.
- scripts/pr-screenshots.mjs — fail-soft full-page screenshotter reused by pr-describe.
- package.json — adds test:e2e, test:e2e:ui, test:e2e:report, pr:screenshots; @playwright/test@1.55.0 devDep.
Skills (.claude/skills/<name>/SKILL.md)
- pr-check — full quality gate (typecheck, lint, web/CLI tests, blueprint validation) → pass/fail summary.
- e2e-pr — runs the Playwright suite scoped to a PR's changed routes.
- pr-describe — structured PR description from the diff, with static-gated, security-guarded before/after screenshots.
.gitignore — track .claude/skills, ignore Playwright output dirs.

Test plan

pnpm test:e2e ✅ — 2 smoke tests pass (verified locally; sandbox Chromium, dev-server boot, /about render).
scripts/pr-screenshots.mjs ✅ — captured a full-page PNG of /about.
CI workflow installs Chromium and runs the suite on this PR.

Risks / rollback

Low-risk and additive — no application/runtime code is changed; this only adds test infra, a CI workflow, and skill docs. The new workflow uses pull_request (no secrets, read-only token for forks), not pull_request_target. Revert this PR to undo.

Related Issues

None.

🤖 Generated with Claude Code

Generated by Claude Code

Five Claude Code skills for quality-gating PRs in this Next.js/TypeScript project: - test-pr: run Jest tests scoped to files changed vs main - typecheck-pr: tsc --noEmit, surfaces errors new to the branch - lint-pr: Next.js ESLint on changed files, with optional --fix - blueprint-validate: validates Weval blueprint YAML structure and staging limits - pr-check: orchestrates all checks and produces a structured summary table Also updates .gitignore to use .claude/* + !.claude/skills so skill files are tracked while the rest of .claude/ stays ignored. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011rZDvWQFXH3m42kYMzEY2g

Change .claude → .claude/* so the !.claude/skills negation takes effect. Git cannot negate inside an ignored directory; switching to a glob on the directory's contents allows the exception to work. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011rZDvWQFXH3m42kYMzEY2g

Stand up a deterministic, version-controlled e2e layer (best-practice @playwright/test specs in CI) and two skills that drive it as PRs are created. Foundation: - playwright.config.ts: boots `pnpm dev`, pre-warms /about as the readiness probe, retries+trace+video on CI. Auto-detects the sandbox Chromium at /opt/pw-browsers and falls back to bundled browser locally/in CI. - tests/e2e/smoke.spec.ts: smoke tests on /about (static, secret-free → green in CI without storage/API keys). README documents env caveats + conventions. - .github/workflows/e2e.yml: runs the suite on PRs/pushes, uploads HTML report. - scripts/pr-screenshots.mjs: fail-soft full-page screenshotter reused by the PR-description skill. - package.json scripts: test:e2e, test:e2e:ui, test:e2e:report, pr:screenshots. Skills: - e2e-pr: run the suite scoped to a PR's changed routes and report. - pr-describe: generate a PR description from the diff and, for visual changes, attach before/after screenshots (base captured in an isolated git worktree). Strictly time-boxed and fail-soft — falls back to a text-only description. Verified locally: both smoke tests pass and the screenshot script captures a full-page PNG via the sandbox Chromium. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011rZDvWQFXH3m42kYMzEY2g

The repo is public, so committed screenshots are permanent and world-readable. Bake in non-negotiable rules: route denylist (/admin*, /api*, auth routes), secret-free capture env only, no false "delete before merge" comfort, CI-artifact hosting for anything potentially sensitive, and human review before pushing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011rZDvWQFXH3m42kYMzEY2g

Screenshots only pay off on static, secret-free routes; data-driven pages render empty/mock against the local dev env. Default to a static allowlist (/about, /what-is-an-eval) and skip everything else with a note, overridable via --routes. Filters run in order: route mapping -> security denylist (non-overridable) -> static gate (overridable) -> dynamic-route skip. If nothing survives, fall back to a text-only description without booting servers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011rZDvWQFXH3m42kYMzEY2g

Drop test-pr, typecheck-pr, lint-pr, and blueprint-validate. pr-check already orchestrates typecheck/lint/tests directly via their pnpm commands, so the standalone variants were redundant. Remaining skills: pr-check, e2e-pr, pr-describe. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011rZDvWQFXH3m42kYMzEY2g

Borrowed from common PR-description skills: an explicit blast-radius + how-to-undo section so reviewers see what to scrutinize. Kept honest — one line for low-risk, self-contained changes rather than padding. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011rZDvWQFXH3m42kYMzEY2g

Related Issues supports Closes/Refs linking (auto-closes issues on merge). The worked example anchors the model to consistent output and shows all sections filled in for a small UI change. Test plan + Risks/rollback retained. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011rZDvWQFXH3m42kYMzEY2g

Claude Code only auto-discovers skills at .claude/skills/<name>/SKILL.md with YAML frontmatter (name + description). The flat .md files were not discoverable. Move pr-check, e2e-pr, and pr-describe into that structure and add trigger- oriented descriptions so the harness surfaces them. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011rZDvWQFXH3m42kYMzEY2g

pnpm/action-setup@v4 errors when both `version` and package.json's `packageManager` specify a pnpm version. Drop the explicit version: 9 and let the action read pnpm@9.6.0 from packageManager. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011rZDvWQFXH3m42kYMzEY2g

Produce Conventional Commits titles (type(scope): summary), inferring type/scope from the diff. The skill now sets/updates the PR title as well as the body. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011rZDvWQFXH3m42kYMzEY2g

claude added 9 commits June 30, 2026 18:10

railway-app Bot temporarily deployed to weval / app-pr-30 June 30, 2026 20:29 Destroyed

railway-app Bot temporarily deployed to weval / app-pr-30 June 30, 2026 20:52 Destroyed

nojibe changed the title ~~Add Playwright e2e foundation and PR testing/description skills~~ test(e2e): add Playwright foundation and PR testing/description skills Jun 30, 2026

railway-app Bot temporarily deployed to weval / app-pr-30 June 30, 2026 20:57 Destroyed

nojibe merged commit 461550b into main Jun 30, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(e2e): add Playwright foundation and PR testing/description skills#30

test(e2e): add Playwright foundation and PR testing/description skills#30
nojibe merged 11 commits into
mainfrom
claude/agent-skills-pr-testing-cd7abs

nojibe commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

nojibe commented Jun 30, 2026

Summary

Motivation / context

Changes

Test plan

Risks / rollback

Related Issues

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants