test(e2e): broaden full-stack e2e for key/scroll + replay#16
Conversation
Add `serve` script alias to dummy/web/package.json (vite preview). New src/dummy-web.e2e.test.ts builds the Nimbus Store fixture once, serves dist/ via Bun.serve (real 404s for missing assets), and runs four guarded e2e tests with a scripted MockLanguageModelV3 that reports the bugs from BUGS.md (network 404, console error, ReferenceError, invisible text), asserting findings structure and on-disk persistence. Co-Authored-By: Claude <noreply@anthropic.com>
- Add .stylelintrc.json (stylelint-config-standard-scss base; double-slash comment-empty-line-before always/except first-nested enforced; camelCase selectors + full-hex colors + longhand gap + media-range-notation disabled to keep planted bugs intact) - Add stylelint + stylelint-config-standard-scss devDeps; lint:css script - Fix process.env['KEY'] → process.env.KEY (Biome useLiteralKeys) across e2e + desktop integration tests - Guard dummy-web.e2e.test.ts beforeAll with CHROME check so describe.skip doesn't run side-effectful setup in Bun (avoids 5 s build timeout) Co-Authored-By: Claude <noreply@anthropic.com>
- Nested describe 'key/scroll steps and replay evidence' with its own mock driver: Tab key → scroll down → report passed. - Test 1: asserts both a `key …` and `scroll …` step appear in findings.steps after the real CDP adapter executes the acts. - Test 2: asserts replay outcome is visible — findings.evidence is set (ffmpeg present) OR a 'replay video' skip step is recorded (absent). - Suite guarded by the existing CHROME skip; all 579 unit tests pass. Co-Authored-By: Claude <noreply@anthropic.com>
- loop overlays its recorded act-trail as the terminal findings' steps; the report step no longer clobbers it with the driver's (often empty) list, so key/scroll steps survive into findings.steps (e2e fix). - dummy/web is a standalone package (own bun.lock); CI's root install never reaches it. Add a CI step + in-test fallback to install its deps and build the Vite fixture, with a generous beforeAll hook timeout. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Warning Review limit reachedYou’ve reached a temporary PR review limit under our Fair Usage Limits Policy. Next review available in: 34 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughAdds an exported ChangesTrail-aware terminal findings
dummy/web e2e suite and fixture tooling
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/adapters/desktop/desktop-adapter.integration.test.ts`:
- Around line 47-50: The test gate in the desktop integration check is too
X11-specific and skips Wayland-only environments, leaving the Wayland
window-control path untested. Update the guard logic in the desktop adapter
integration test around the environment checks and `xdotool` probe so it can
detect and allow a Wayland-capable setup instead of requiring `DISPLAY` and
`xdotool` unconditionally. Keep the existing skip behavior for missing desktop
prerequisites, but branch on the active session type so both X11 and Wayland
paths in the desktop adapter remain covered.
In `@src/agent/loop.ts`:
- Around line 251-260: The report fallback and returned counts are using
different sources of truth, causing `findings.json` to be rewritten with the
recorded trail while `ReportResult.counts.steps` still reflects the driver’s
original empty `input.steps`. Update
`progress.writeFindings`/`terminalFindingsWithTrail` in `src/agent/loop.ts` and
the `report` flow in `src/agent/belt/report.ts` so both the persisted terminal
findings and the returned `counts` are derived from the same authoritative
`Findings` object, not from separate driver input and trail state.
In `@src/dummy-web.e2e.test.ts`:
- Around line 266-287: The terminal status assertion in the dummy web E2E tests
is too loose: the scripted Nimbus Store fixture with planted bugs should end in
a known failing verdict, not either passed or failed. Update the affected
`get_findings` expectations in `src/dummy-web.e2e.test.ts` to assert the
explicit `failed` status for the `start_debug`/`get_findings` flow, using the
existing `findings.status` check so this regression stays strict.
- Around line 260-263: Guard the teardown in afterEach against partial setup
failures in the dummy-web e2e test by treating the manager and client handles as
optional until setup completes. Update the afterEach logic to check the assigned
handles before calling manager.end(cwd) and client.close(), using the existing
manager and client symbols so teardown no longer throws when beforeEach aborts
early.
In `@src/e2e.test.ts`:
- Around line 438-445: The e2e assertions are too permissive because they accept
either verdict instead of locking to the mock driver’s expected passed status.
Update the status checks in src/e2e.test.ts where findings.status is verified so
they require passed explicitly, and keep the existing schema/step assertions
intact; use the findings.status and FindingsSchema.parse expectations in this
describe block as the places to tighten.
- Around line 413-416: The nested-suite teardown in afterEach should tolerate
partial setup failures, since subManager or subClient may be undefined if
beforeEach aborts early. Update the cleanup logic around subManager.has(subCwd),
subManager.end(subCwd), and subClient.close() to use optional checks/handles so
teardown only runs when those objects were actually assigned, preventing
secondary cleanup errors from masking the original failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8237242b-8b43-42c8-9561-9f6239b69dc2
⛔ Files ignored due to path filters (1)
dummy/web/bun.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
.github/workflows/ci.ymldummy/web/.stylelintrc.jsondummy/web/package.jsonsrc/adapters/desktop/desktop-adapter.integration.test.tssrc/agent/belt/report.tssrc/agent/loop.test.tssrc/agent/loop.tssrc/dummy-web.e2e.test.tssrc/e2e.test.ts
- report tool overlays the shared act-trail and derives the terminal write AND its counts from one terminalFindings object; drop the loop's redundant terminal write so counts.steps can't drift from findings.json - guard manager/client handles in e2e afterEach against partial setup - tighten scripted-verdict assertions to exact passed/failed Co-Authored-By: Claude <noreply@anthropic.com>
Summary
src/e2e.test.tswith a scripted mock driver (key Tab → scroll down → report) distinct from the existingobserve → reportdriver.findings.stepscontains both akey …andscroll …entry after real CDP adapter executes the acts — proves the two newestactverbs are wired end-to-end.findings.evidenceset (ffmpeg present) or a'replay video'skip step recorded (absent) — covers the post-verdict replay path the session-builder always wires.CHROMEdetection).Summary by CodeRabbit
New Features
Bug Fixes
Chores