test(e2e): broaden full-stack e2e for key/scroll + replay by sebyx07 · Pull Request #16 · developerz-ai/ui-debugger-mcp

sebyx07 · 2026-06-29T23:59:58Z

Summary

Adds nested describe in src/e2e.test.ts with a scripted mock driver (key Tab → scroll down → report) distinct from the existing observe → report driver.
Test 1: asserts findings.steps contains both a key … and scroll … entry after real CDP adapter executes the acts — proves the two newest act verbs are wired end-to-end.
Test 2: asserts replay outcome is visible — findings.evidence set (ffmpeg present) or a 'replay video' skip step recorded (absent) — covers the post-verdict replay path the session-builder always wires.
All 579 unit tests pass; browser tests remain skip-guarded (CHROME detection).

Summary by CodeRabbit

New Features
- Added a new CSS linting command and preview server for the web fixture.
- Expanded end-to-end coverage for keyboard, scroll, and replay evidence handling.
- Added a new browser-based test flow for the web fixture’s findings output.
Bug Fixes
- Improved handling of final report findings so recorded action trails are included in results.
- Updated test environment behavior to better skip browser or desktop tests when unavailable.
Chores
- CI now pre-builds the web fixture to speed up later test steps.

Add `serve` script alias to dummy/web/package.json (vite preview). New src/dummy-web.e2e.test.ts builds the Nimbus Store fixture once, serves dist/ via Bun.serve (real 404s for missing assets), and runs four guarded e2e tests with a scripted MockLanguageModelV3 that reports the bugs from BUGS.md (network 404, console error, ReferenceError, invisible text), asserting findings structure and on-disk persistence. Co-Authored-By: Claude <noreply@anthropic.com>

- Add .stylelintrc.json (stylelint-config-standard-scss base; double-slash comment-empty-line-before always/except first-nested enforced; camelCase selectors + full-hex colors + longhand gap + media-range-notation disabled to keep planted bugs intact) - Add stylelint + stylelint-config-standard-scss devDeps; lint:css script - Fix process.env['KEY'] → process.env.KEY (Biome useLiteralKeys) across e2e + desktop integration tests - Guard dummy-web.e2e.test.ts beforeAll with CHROME check so describe.skip doesn't run side-effectful setup in Bun (avoids 5 s build timeout) Co-Authored-By: Claude <noreply@anthropic.com>

- Nested describe 'key/scroll steps and replay evidence' with its own mock driver: Tab key → scroll down → report passed. - Test 1: asserts both a `key …` and `scroll …` step appear in findings.steps after the real CDP adapter executes the acts. - Test 2: asserts replay outcome is visible — findings.evidence is set (ffmpeg present) OR a 'replay video' skip step is recorded (absent). - Suite guarded by the existing CHROME skip; all 579 unit tests pass. Co-Authored-By: Claude <noreply@anthropic.com>

- loop overlays its recorded act-trail as the terminal findings' steps; the report step no longer clobbers it with the driver's (often empty) list, so key/scroll steps survive into findings.steps (e2e fix). - dummy/web is a standalone package (own bun.lock); CI's root install never reaches it. Add a CI step + in-test fallback to install its deps and build the Vite fixture, with a generous beforeAll hook timeout. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-30T00:14:26Z

Warning

Review limit reached

You’ve reached a temporary PR review limit under our Fair Usage Limits Policy.

Your recent review volume is higher than typical usage, so adaptive limits are currently applied.

Next review available in: 34 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3316f2bd-e120-43e7-b933-ce8c360e1d15

📥 Commits

Reviewing files that changed from the base of the PR and between fba2ea8 and 0754d77.

📒 Files selected for processing (6)

src/agent/belt/report.test.ts
src/agent/belt/report.ts
src/agent/loop.ts
src/dummy-web.e2e.test.ts
src/e2e.test.ts
src/services/session-builder.ts

📝 Walkthrough

Walkthrough

Adds an exported reportFindings helper extracted from runReport, and a new terminalFindingsWithTrail function in the agent loop that overlays the recorded act-trail into terminal findings written by onStepFinish. Introduces a full e2e test suite for the dummy/web fixture (static server, MCP wiring, four bug-coverage tests), extends the existing e2e suite with key/scroll step and replay evidence tests, and adds Stylelint config plus a CI pre-build step for the fixture.

Changes

Trail-aware terminal findings

Layer / File(s)	Summary
`reportFindings` extraction from `runReport` `src/agent/belt/report.ts`	Exports a new `reportFindings(input)` helper that maps `ReportInput` to `Findings`; `runReport` is refactored to delegate to it.
`terminalFindingsWithTrail` in the agent loop `src/agent/loop.ts`	Imports `reportFindings`/`ReportInputSchema`, implements `terminalFindingsWithTrail` to parse the report tool input and overlay act-trail steps, and updates `onStepFinish` to write terminal findings when no running progress is produced.
Unit tests for `terminalFindingsWithTrail` `src/agent/loop.test.ts`	Adds tests covering trail overlay, empty-trail fallback, non-report tool call returning `null`, and invalid status returning `null`.

dummy/web e2e suite and fixture tooling

Layer / File(s)	Summary
dummy/web fixture tooling and CI pre-build `dummy/web/.stylelintrc.json`, `dummy/web/package.json`, `.github/workflows/ci.yml`	Adds Stylelint config extending `stylelint-config-standard-scss`, `serve`/`lint:css` scripts, `stylelint` dev dependencies, and a CI step that installs and builds `dummy/web` before the e2e suite runs.
dummy/web e2e test suite `src/dummy-web.e2e.test.ts`	Full Bun-based e2e suite: Chromium discovery, `Bun.serve` static server for `dist/`, scripted `MockLanguageModelV3` MCP wiring, per-test workspace lifecycle, and four tests asserting terminal status, bug kinds (`network`/`console`/`flow`/`visual`), 404 network details, and `findings.json` structure.
Key/scroll steps and replay evidence in existing e2e suite `src/e2e.test.ts`, `src/adapters/desktop/desktop-adapter.integration.test.ts`	Adds a `describe` block verifying `key`/Tab and `scroll`/down act steps appear in `findings.steps`, asserts replay evidence on disk or as a `replay video` step, and updates dot-notation env-var checks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

developerz-ai/ui-debugger-mcp#6: Modifies src/agent/loop.ts around terminal findings and report tool handling, directly overlapping with the terminalFindingsWithTrail and onStepFinish changes in this PR.

Poem

🐇 A trail of hops left in the code,
Each key and scroll along the road.
The findings now remember where we've been,
With act-trail steps woven in between.
dummy/web serves its bugs with pride—
Four e2e tests stand satisfied! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly matches the main change: expanding end-to-end coverage for key/scroll actions and replay evidence.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/dummy-web-e2e-fixture

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/adapters/desktop/desktop-adapter.integration.test.ts`:
- Around line 47-50: The test gate in the desktop integration check is too
X11-specific and skips Wayland-only environments, leaving the Wayland
window-control path untested. Update the guard logic in the desktop adapter
integration test around the environment checks and `xdotool` probe so it can
detect and allow a Wayland-capable setup instead of requiring `DISPLAY` and
`xdotool` unconditionally. Keep the existing skip behavior for missing desktop
prerequisites, but branch on the active session type so both X11 and Wayland
paths in the desktop adapter remain covered.

In `@src/agent/loop.ts`:
- Around line 251-260: The report fallback and returned counts are using
different sources of truth, causing `findings.json` to be rewritten with the
recorded trail while `ReportResult.counts.steps` still reflects the driver’s
original empty `input.steps`. Update
`progress.writeFindings`/`terminalFindingsWithTrail` in `src/agent/loop.ts` and
the `report` flow in `src/agent/belt/report.ts` so both the persisted terminal
findings and the returned `counts` are derived from the same authoritative
`Findings` object, not from separate driver input and trail state.

In `@src/dummy-web.e2e.test.ts`:
- Around line 266-287: The terminal status assertion in the dummy web E2E tests
is too loose: the scripted Nimbus Store fixture with planted bugs should end in
a known failing verdict, not either passed or failed. Update the affected
`get_findings` expectations in `src/dummy-web.e2e.test.ts` to assert the
explicit `failed` status for the `start_debug`/`get_findings` flow, using the
existing `findings.status` check so this regression stays strict.
- Around line 260-263: Guard the teardown in afterEach against partial setup
failures in the dummy-web e2e test by treating the manager and client handles as
optional until setup completes. Update the afterEach logic to check the assigned
handles before calling manager.end(cwd) and client.close(), using the existing
manager and client symbols so teardown no longer throws when beforeEach aborts
early.

In `@src/e2e.test.ts`:
- Around line 438-445: The e2e assertions are too permissive because they accept
either verdict instead of locking to the mock driver’s expected passed status.
Update the status checks in src/e2e.test.ts where findings.status is verified so
they require passed explicitly, and keep the existing schema/step assertions
intact; use the findings.status and FindingsSchema.parse expectations in this
describe block as the places to tighten.
- Around line 413-416: The nested-suite teardown in afterEach should tolerate
partial setup failures, since subManager or subClient may be undefined if
beforeEach aborts early. Update the cleanup logic around subManager.has(subCwd),
subManager.end(subCwd), and subClient.close() to use optional checks/handles so
teardown only runs when those objects were actually assigned, preventing
secondary cleanup errors from masking the original failure.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8237242b-8b43-42c8-9561-9f6239b69dc2

📥 Commits

Reviewing files that changed from the base of the PR and between d896c73 and fba2ea8.

⛔ Files ignored due to path filters (1)

dummy/web/bun.lock is excluded by !**/*.lock

📒 Files selected for processing (9)

.github/workflows/ci.yml
dummy/web/.stylelintrc.json
dummy/web/package.json
src/adapters/desktop/desktop-adapter.integration.test.ts
src/agent/belt/report.ts
src/agent/loop.test.ts
src/agent/loop.ts
src/dummy-web.e2e.test.ts
src/e2e.test.ts

- report tool overlays the shared act-trail and derives the terminal write AND its counts from one terminalFindings object; drop the loop's redundant terminal write so counts.steps can't drift from findings.json - guard manager/client handles in e2e afterEach against partial setup - tighten scripted-verdict assertions to exact passed/failed Co-Authored-By: Claude <noreply@anthropic.com>

sebyx07 and others added 3 commits June 29, 2026 18:46

sebyx07 added the claudetm Claude Task Master label Jun 29, 2026

coderabbitai Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread src/adapters/desktop/desktop-adapter.integration.test.ts

Comment thread src/agent/loop.ts Outdated

Comment thread src/dummy-web.e2e.test.ts

Comment thread src/dummy-web.e2e.test.ts

Comment thread src/e2e.test.ts

Comment thread src/e2e.test.ts Outdated

sebyx07 merged commit 533a05c into main Jun 30, 2026
2 checks passed

sebyx07 deleted the feat/dummy-web-e2e-fixture branch June 30, 2026 00:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(e2e): broaden full-stack e2e for key/scroll + replay#16

test(e2e): broaden full-stack e2e for key/scroll + replay#16
sebyx07 merged 5 commits into
mainfrom
feat/dummy-web-e2e-fixture

sebyx07 commented Jun 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 30, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sebyx07 commented Jun 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sebyx07 commented Jun 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 30, 2026 •

edited

Loading