Skip to content

🤖 feat: coalesce consecutive file_read/file_edit tool calls in transcript#3379

Open
ammar-agent wants to merge 9 commits into
mainfrom
feat/transcript-tool-coalescing
Open

🤖 feat: coalesce consecutive file_read/file_edit tool calls in transcript#3379
ammar-agent wants to merge 9 commits into
mainfrom
feat/transcript-tool-coalescing

Conversation

@ammar-agent
Copy link
Copy Markdown
Collaborator

Summary

Add automatic coalescing for consecutive file_read and file_edit_* tool calls in the transcript. The first call in a run renders normally; when a second arrives, the slot converts in-place to a Coalesce row that summarizes the burst (e.g. Read files src/App.tsx, src/main.ts). Clicking the row expands the group and reveals each underlying tool call exactly as before.

Background

Long agent turns frequently emit bursts of file reads/edits that push more useful content (assistant prose, terminal output) off-screen. The existing bash_output collapsing pattern already does this for repeated polls of the same process; this PR extends the same idea to file IO bursts behind a rule-based registry so additional kinds (web fetches, advisor, …) can opt in later.

Implementation

  • src/browser/utils/messages/toolCoalescing.tscomputeToolCoalesceInfos(messages) walks the transcript in one O(n) pass and returns per-index { kind, position: "head" | "member", filePaths, headIndex, totalCount }. Groups of size ≥ 2 with the same ToolCoalesceKind (file_read or file_edit) qualify.
  • src/browser/features/Tools/CoalescedToolCall.tsx — presentational summary row that mirrors a collapsed FileReadToolCall's dimensions (same ToolContainer padding, same row height). That keeps the 1 → N transition a content swap rather than a row insertion, so users do not see a layout flash when the second member streams in mid-turn.
  • src/browser/components/ChatPane/ChatPane.tsx — wired alongside the existing bashOutputGroupInfos flow:
    • precomputes toolCoalesceInfos once per snapshot,
    • hides members when the group is collapsed,
    • replaces the head's MessageRenderer with CoalescedToolCall when collapsed,
    • renders both the summary row and the head's normal tool call when expanded so the summary doubles as the collapse toggle.
    • owns the expandedToolCoalesceGroups Set<headId> and resets it on workspace switch (mirroring expandedBashGroups).

The summary uses past-tense copy matching the request (Read files … / Wrote files …). All file_edit_* variants (replace_string, replace_lines, insert) collapse under a single file_edit kind so a mixed burst still forms one block; the active registry only routes two of those today but the legacy variant is included for historical transcripts.

Validation

  • bun test src/browser/utils/messages/toolCoalescing.test.ts — 10 cases covering head/member layout, mixed file_edit variants, kind boundaries, intervening tool/user messages, multi-group transcripts, and unknown path fallback.
  • bun test src/browser/features/Tools/CoalescedToolCall.test.tsx — 4 happy-dom cases for collapsed/expanded copy, onToggle wiring, and aria-expanded state.
  • bun test src/browser/features/Tools src/browser/features/Messages src/browser/utils/messages — 612 pass, 0 fail.
  • make static-check — clean.
  • Two new Storybook stories (TwoReads, ManyReads, TwoEdits, ExpandedReads) under App/Chat/Tools/CoalescedToolCall for visual review.

Risks

Render-path change in ChatPane.tsx map block — the same hot path that handles bash_output collapsing. Mitigations:

  • The new branch is additive (early-returns null for hidden members, swaps the head's element when collapsed) and short-circuits to the existing render when no coalesce info is present, so non-file_read/file_edit traffic is unaffected.
  • Expansion state is keyed by the head message ID, so it survives later additions to the same group without changing identity.
  • findTranscriptMessageElement selects on [data-message-id]. The summary row deliberately omits that attribute so scroll-into-view targeting still resolves to real messages (head MessageRenderer when expanded; the user/assistant rows in all other cases).
  • Existing tests that count [data-testid="chat-message"] (truncation, fork, task report relocation) all use bash/task/assistant fixtures, not file_read/file_edit, so they are unaffected.

Generated with mux • Model: anthropic:claude-opus-4-7 • Thinking: xhigh

…ript

Introduce a rule-based coalescing pass that hides bursts of consecutive
file_read or file_edit_* tool calls behind a single 'Read files a, b'
or 'Wrote files a, b' summary row. The summary replaces the head call's
slot so the 1 -> N transition is a content swap rather than a row
insertion, eliminating layout flash when the second member streams in.
Clicking the summary expands the group and shows each individual tool
call exactly as before.

Layered on top of the existing bash_output coalescing pattern:
- computeToolCoalesceInfos returns aligned head/member metadata once per
  message snapshot (O(n)).
- CoalescedToolCall is a presentational summary row sized to match a
  collapsed FileReadToolCall so the swap is dimensionally stable.
- ChatPane owns the expansion-state Set, keyed by the head's message id,
  and resets it on workspace switch (mirrors expandedBashGroups).

Why only file_read and file_edit for now: those are the two highest-
volume read/write tools, and they already share extractToolFilePath as
a path source. Future kinds can plug in by extending ToolCoalesceKind.
@ammar-agent
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2f5d6b30f6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/browser/components/ChatPane/ChatPane.tsx
Bursts often touch the same file several times (e.g. multi-hunk edits).
Showing 'a.ts, a.ts, b.ts' adds noise without information, so dedupe in
the renderer while preserving first-occurrence order. The underlying
filePaths array (and totalCount) still reflect every coalesced tool
call, so future surfaces that need the raw history still have it.
@ammar-agent
Copy link
Copy Markdown
Collaborator Author

@codex review

Pushed a follow-up that dedupes repeated file paths in the coalesced summary so a burst that touches a.ts, a.ts, b.ts displays as a.ts, b.ts. Dedupe is display-only; the underlying filePaths array still reflects every coalesced tool call.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 201bf0e254

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/browser/utils/messages/toolCoalescing.ts
Addresses Codex P1: hiding a partial tool row eats the InterruptedBarrier
the transcript renders for interruption signals. When the stream stops
mid-burst we want every member of the burst to keep its uncoalesced row
so the user still sees "interrupted" exactly where it happened.

Implemented as a single pass over each candidate group's members: if any
has isPartial=true, the whole group falls back to normal rendering.
@ammar-agent
Copy link
Copy Markdown
Collaborator Author

@codex review

Addressed the P1 by refusing to coalesce a group that contains an interrupted (isPartial) member, so the InterruptedBarrier still renders exactly where it would have without coalescing. Added a regression test in toolCoalescing.test.ts.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c50c781d3c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/browser/utils/messages/toolCoalescing.ts Outdated
…t partial

Addresses Codex P2: pending/executing/failed/interrupted/redacted rows
all carry status or error UI the summary row deliberately omits.
Hiding any such row in a burst would mask the actionable signal until
the user manually expands.

Generalises the previous partial-only guard to a single
'isCoalesceableToolMember' check: a member must be type=tool,
!isPartial, and status=completed. Anything else flips the whole group
back to normal rendering.
@ammar-agent
Copy link
Copy Markdown
Collaborator Author

@codex review

Addressed the P2 by generalising the per-member check: a group only coalesces when every member is !isPartial and status === "completed". Pending/executing/failed/interrupted/redacted all force normal rendering so the per-row status/error UI stays visible. New regression tests cover failed and still-running members.

@ammar-agent
Copy link
Copy Markdown
Collaborator Author

@codex review

The previous review was against the partial-only commit. The current head (c48ea47fa) generalises the per-member check to !isPartial && status === "completed", so failed/executing/pending/interrupted/redacted members all force the group back to normal rendering. Regression tests for failed and executing members were added in the same commit.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

The previous version supplied a custom decorators array on the story
meta, which shadowed lightweightMeta's StoryUiShell decorator and
therefore stripped TooltipProvider. Storybook's runner picked that up
as 'Tooltip must be used within TooltipProvider' for every story.

Render via an in-story StoryLayout wrapper instead so the meta-level
decorator stack stays intact. Opt out of Chromatic snapshots for these
visual-reference stories to keep the global snapshot budget happy.
…y interactions

The previous stories rendered the component but had no play functions, so
test-storybook treated them as smoke-render-only. Convert each story into a
real interaction test that exercises the new pattern's surfaces:

- TwoReadsClickToExpand drives the click toggle through both branches
  (collapsed -> expanded -> collapsed) and asserts aria-expanded.
- ManyReads asserts chronological-order joined paths.
- TwoEdits asserts the past-tense 'Wrote files' verb for the file_edit kind.
- DeduplicatedPaths asserts the display-only dedupe of repeated file names
  (5 raw calls -> 3 unique paths in first-occurrence order).
- InitiallyExpanded asserts aria-expanded reflects the initial prop.

Stories run under test-storybook in CI (the dedicated 'Test / Storybook'
job in pr.yml), which exercises play functions in a real browser via the
@storybook/test-runner. Chromatic visual snapshots stay disabled to
respect the global snapshot budget.
… rebalance)

Reviewers asked for at least one Chromatic visual snapshot of the new
coalesced-tool-call row so the new UX has a pinned visual baseline, not just
play-driven behavior coverage.

Changes:

- Add `CoalescedToolCall.visual.stories.tsx` containing one Chromatic-enabled
  story (`CollapsedReadFiles`) that renders the canonical new behavior: the
  collapsed "Read files src/App.tsx, src/main.ts" row. The sibling
  `CoalescedToolCall.stories.tsx` keeps its meta-level `chromatic:
  CHROMATIC_DISABLED` and continues to own the interaction (play) coverage
  for click-to-expand, dedupe, the `file_edit` verb, and the initial-expanded
  state.
- Opt `ProvidersSection.stories.tsx` out of Chromatic snapshots at the meta
  level. All four stories in that file already drive `play` functions that
  assert the relevant settings-panel behavior (empty state, configured
  providers, env-sourced indicators, expanded provider config), and the
  layout has no animation/scroll-fade/state nuances worth a pixel baseline.
  This frees four snapshots from the global Chromatic budget.
- Net delta on `tests/ui/storybook/budget.test.ts`: +1 (new visual story)
  -4 (ProvidersSection) = -3, taking the estimated snapshot count from 302
  (pre-existing breach on main) to 299, comfortably under the 300 cap.

Validation:

- `bun test ./tests/ui/storybook/budget.test.ts` -> both budget assertions
  pass (files ≤ 70, snapshots ≤ 300).
- `make typecheck`, `make lint`, `make fmt-check`, `make storybook-build`
  all clean.
- Targeted unit + storybook tests: `CoalescedToolCall.test.tsx`,
  `toolCoalescing.test.ts`, and `tests/ui/storybook/` all pass.

---

_Generated with `mux` • Model: `anthropic:claude-opus-4-7` • Thinking: `xhigh` • Cost: `$29.08`_

<!-- mux-attribution: model=anthropic:claude-opus-4-7 thinking=xhigh costs=29.08 -->
@ammar-agent
Copy link
Copy Markdown
Collaborator Author

Added a single Chromatic baseline for the new coalesced-tool-call row in CoalescedToolCall.visual.stories.tsx (CollapsedReadFiles). The original CoalescedToolCall.stories.tsx stays Chromatic-disabled and continues to own all the play-driven interaction coverage (click-to-expand, dedupe, file_edit verb, initial-expanded state).

To keep the global snapshot budget (tests/ui/storybook/budget.test.ts, ≤ 300) green, I opted ProvidersSection.stories.tsx out of Chromatic at the meta level. Every story in that file already drives a play function that asserts the relevant settings-panel behavior, and the layout has no animation/scroll-fade/state nuances worth a pixel baseline — so disabling visual snapshots there costs nothing in real coverage while freeing four snapshots.

Net delta: +1 (new visual story) − 4 (ProvidersSection) = −3 ⇒ estimated count goes from 302 (pre-existing main overage) to 299, comfortably under the 300 cap.

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant