Skip to content

test(e2e): broaden full-stack e2e for key/scroll + replay#16

Merged
sebyx07 merged 5 commits into
mainfrom
feat/dummy-web-e2e-fixture
Jun 30, 2026
Merged

test(e2e): broaden full-stack e2e for key/scroll + replay#16
sebyx07 merged 5 commits into
mainfrom
feat/dummy-web-e2e-fixture

Conversation

@sebyx07

@sebyx07 sebyx07 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds nested describe in src/e2e.test.ts with a scripted mock driver (key Tab → scroll down → report) distinct from the existing observe → report driver.
  • Test 1: asserts findings.steps contains both a key … and scroll … entry after real CDP adapter executes the acts — proves the two newest act verbs are wired end-to-end.
  • Test 2: asserts replay outcome is visible — findings.evidence set (ffmpeg present) or a 'replay video' skip step recorded (absent) — covers the post-verdict replay path the session-builder always wires.
  • All 579 unit tests pass; browser tests remain skip-guarded (CHROME detection).

Summary by CodeRabbit

  • New Features

    • Added a new CSS linting command and preview server for the web fixture.
    • Expanded end-to-end coverage for keyboard, scroll, and replay evidence handling.
    • Added a new browser-based test flow for the web fixture’s findings output.
  • Bug Fixes

    • Improved handling of final report findings so recorded action trails are included in results.
    • Updated test environment behavior to better skip browser or desktop tests when unavailable.
  • Chores

    • CI now pre-builds the web fixture to speed up later test steps.

sebyx07 and others added 3 commits June 29, 2026 18:46
Add `serve` script alias to dummy/web/package.json (vite preview).
New src/dummy-web.e2e.test.ts builds the Nimbus Store fixture once,
serves dist/ via Bun.serve (real 404s for missing assets), and runs
four guarded e2e tests with a scripted MockLanguageModelV3 that reports
the bugs from BUGS.md (network 404, console error, ReferenceError,
invisible text), asserting findings structure and on-disk persistence.

Co-Authored-By: Claude <noreply@anthropic.com>
- Add .stylelintrc.json (stylelint-config-standard-scss base; double-slash
  comment-empty-line-before always/except first-nested enforced; camelCase
  selectors + full-hex colors + longhand gap + media-range-notation disabled
  to keep planted bugs intact)
- Add stylelint + stylelint-config-standard-scss devDeps; lint:css script
- Fix process.env['KEY'] → process.env.KEY (Biome useLiteralKeys) across
  e2e + desktop integration tests
- Guard dummy-web.e2e.test.ts beforeAll with CHROME check so describe.skip
  doesn't run side-effectful setup in Bun (avoids 5 s build timeout)

Co-Authored-By: Claude <noreply@anthropic.com>
- Nested describe 'key/scroll steps and replay evidence' with its own
  mock driver: Tab key → scroll down → report passed.
- Test 1: asserts both a `key …` and `scroll …` step appear in
  findings.steps after the real CDP adapter executes the acts.
- Test 2: asserts replay outcome is visible — findings.evidence is set
  (ffmpeg present) OR a 'replay video' skip step is recorded (absent).
- Suite guarded by the existing CHROME skip; all 579 unit tests pass.

Co-Authored-By: Claude <noreply@anthropic.com>
@sebyx07 sebyx07 added the claudetm Claude Task Master label Jun 29, 2026
- loop overlays its recorded act-trail as the terminal findings' steps;
  the report step no longer clobbers it with the driver's (often empty)
  list, so key/scroll steps survive into findings.steps (e2e fix).
- dummy/web is a standalone package (own bun.lock); CI's root install
  never reaches it. Add a CI step + in-test fallback to install its deps
  and build the Vite fixture, with a generous beforeAll hook timeout.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

You’ve reached a temporary PR review limit under our Fair Usage Limits Policy.

Your recent review volume is higher than typical usage, so adaptive limits are currently applied.

Next review available in: 34 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3316f2bd-e120-43e7-b933-ce8c360e1d15

📥 Commits

Reviewing files that changed from the base of the PR and between fba2ea8 and 0754d77.

📒 Files selected for processing (6)
  • src/agent/belt/report.test.ts
  • src/agent/belt/report.ts
  • src/agent/loop.ts
  • src/dummy-web.e2e.test.ts
  • src/e2e.test.ts
  • src/services/session-builder.ts
📝 Walkthrough

Walkthrough

Adds an exported reportFindings helper extracted from runReport, and a new terminalFindingsWithTrail function in the agent loop that overlays the recorded act-trail into terminal findings written by onStepFinish. Introduces a full e2e test suite for the dummy/web fixture (static server, MCP wiring, four bug-coverage tests), extends the existing e2e suite with key/scroll step and replay evidence tests, and adds Stylelint config plus a CI pre-build step for the fixture.

Changes

Trail-aware terminal findings

Layer / File(s) Summary
reportFindings extraction from runReport
src/agent/belt/report.ts
Exports a new reportFindings(input) helper that maps ReportInput to Findings; runReport is refactored to delegate to it.
terminalFindingsWithTrail in the agent loop
src/agent/loop.ts
Imports reportFindings/ReportInputSchema, implements terminalFindingsWithTrail to parse the report tool input and overlay act-trail steps, and updates onStepFinish to write terminal findings when no running progress is produced.
Unit tests for terminalFindingsWithTrail
src/agent/loop.test.ts
Adds tests covering trail overlay, empty-trail fallback, non-report tool call returning null, and invalid status returning null.

dummy/web e2e suite and fixture tooling

Layer / File(s) Summary
dummy/web fixture tooling and CI pre-build
dummy/web/.stylelintrc.json, dummy/web/package.json, .github/workflows/ci.yml
Adds Stylelint config extending stylelint-config-standard-scss, serve/lint:css scripts, stylelint dev dependencies, and a CI step that installs and builds dummy/web before the e2e suite runs.
dummy/web e2e test suite
src/dummy-web.e2e.test.ts
Full Bun-based e2e suite: Chromium discovery, Bun.serve static server for dist/, scripted MockLanguageModelV3 MCP wiring, per-test workspace lifecycle, and four tests asserting terminal status, bug kinds (network/console/flow/visual), 404 network details, and findings.json structure.
Key/scroll steps and replay evidence in existing e2e suite
src/e2e.test.ts, src/adapters/desktop/desktop-adapter.integration.test.ts
Adds a describe block verifying key/Tab and scroll/down act steps appear in findings.steps, asserts replay evidence on disk or as a replay video step, and updates dot-notation env-var checks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • developerz-ai/ui-debugger-mcp#6: Modifies src/agent/loop.ts around terminal findings and report tool handling, directly overlapping with the terminalFindingsWithTrail and onStepFinish changes in this PR.

Poem

🐇 A trail of hops left in the code,
Each key and scroll along the road.
The findings now remember where we've been,
With act-trail steps woven in between.
dummy/web serves its bugs with pride—
Four e2e tests stand satisfied! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: expanding end-to-end coverage for key/scroll actions and replay evidence.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/dummy-web-e2e-fixture

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/adapters/desktop/desktop-adapter.integration.test.ts`:
- Around line 47-50: The test gate in the desktop integration check is too
X11-specific and skips Wayland-only environments, leaving the Wayland
window-control path untested. Update the guard logic in the desktop adapter
integration test around the environment checks and `xdotool` probe so it can
detect and allow a Wayland-capable setup instead of requiring `DISPLAY` and
`xdotool` unconditionally. Keep the existing skip behavior for missing desktop
prerequisites, but branch on the active session type so both X11 and Wayland
paths in the desktop adapter remain covered.

In `@src/agent/loop.ts`:
- Around line 251-260: The report fallback and returned counts are using
different sources of truth, causing `findings.json` to be rewritten with the
recorded trail while `ReportResult.counts.steps` still reflects the driver’s
original empty `input.steps`. Update
`progress.writeFindings`/`terminalFindingsWithTrail` in `src/agent/loop.ts` and
the `report` flow in `src/agent/belt/report.ts` so both the persisted terminal
findings and the returned `counts` are derived from the same authoritative
`Findings` object, not from separate driver input and trail state.

In `@src/dummy-web.e2e.test.ts`:
- Around line 266-287: The terminal status assertion in the dummy web E2E tests
is too loose: the scripted Nimbus Store fixture with planted bugs should end in
a known failing verdict, not either passed or failed. Update the affected
`get_findings` expectations in `src/dummy-web.e2e.test.ts` to assert the
explicit `failed` status for the `start_debug`/`get_findings` flow, using the
existing `findings.status` check so this regression stays strict.
- Around line 260-263: Guard the teardown in afterEach against partial setup
failures in the dummy-web e2e test by treating the manager and client handles as
optional until setup completes. Update the afterEach logic to check the assigned
handles before calling manager.end(cwd) and client.close(), using the existing
manager and client symbols so teardown no longer throws when beforeEach aborts
early.

In `@src/e2e.test.ts`:
- Around line 438-445: The e2e assertions are too permissive because they accept
either verdict instead of locking to the mock driver’s expected passed status.
Update the status checks in src/e2e.test.ts where findings.status is verified so
they require passed explicitly, and keep the existing schema/step assertions
intact; use the findings.status and FindingsSchema.parse expectations in this
describe block as the places to tighten.
- Around line 413-416: The nested-suite teardown in afterEach should tolerate
partial setup failures, since subManager or subClient may be undefined if
beforeEach aborts early. Update the cleanup logic around subManager.has(subCwd),
subManager.end(subCwd), and subClient.close() to use optional checks/handles so
teardown only runs when those objects were actually assigned, preventing
secondary cleanup errors from masking the original failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8237242b-8b43-42c8-9561-9f6239b69dc2

📥 Commits

Reviewing files that changed from the base of the PR and between d896c73 and fba2ea8.

⛔ Files ignored due to path filters (1)
  • dummy/web/bun.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • .github/workflows/ci.yml
  • dummy/web/.stylelintrc.json
  • dummy/web/package.json
  • src/adapters/desktop/desktop-adapter.integration.test.ts
  • src/agent/belt/report.ts
  • src/agent/loop.test.ts
  • src/agent/loop.ts
  • src/dummy-web.e2e.test.ts
  • src/e2e.test.ts

Comment thread src/adapters/desktop/desktop-adapter.integration.test.ts
Comment thread src/agent/loop.ts Outdated
Comment thread src/dummy-web.e2e.test.ts
Comment thread src/dummy-web.e2e.test.ts
Comment thread src/e2e.test.ts
Comment thread src/e2e.test.ts Outdated
- report tool overlays the shared act-trail and derives the terminal
  write AND its counts from one terminalFindings object; drop the loop's
  redundant terminal write so counts.steps can't drift from findings.json
- guard manager/client handles in e2e afterEach against partial setup
- tighten scripted-verdict assertions to exact passed/failed

Co-Authored-By: Claude <noreply@anthropic.com>
@sebyx07 sebyx07 merged commit 533a05c into main Jun 30, 2026
2 checks passed
@sebyx07 sebyx07 deleted the feat/dummy-web-e2e-fixture branch June 30, 2026 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claudetm Claude Task Master

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant