Skip to content

feat(record): capture MCP call streams to NDJSON and replay deterministically#175

Open
mvanhorn wants to merge 2 commits into
openclaw:mainfrom
mvanhorn:feat/mcporter-record-replay
Open

feat(record): capture MCP call streams to NDJSON and replay deterministically#175
mvanhorn wants to merge 2 commits into
openclaw:mainfrom
mvanhorn:feat/mcporter-record-replay

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

Summary

  • Adds mcporter record <session> and mcporter replay <session>. Record wraps the runtime transport and appends every JSON-RPC request, response, and notification to a per-session NDJSON file under ~/.mcporter/recordings/<session>.ndjson. Replay reconstructs an in-memory transport from the recording and matches requests by method + deep-equal params, returning the recorded response without contacting the live server.
  • Plain JSON-RPC over NDJSON with a small _meta field (direction, server name, ISO timestamp). No proprietary blob, no streaming archive library, no new runtime deps.
  • Env-var passthrough: MCPORTER_RECORD=<name> and MCPORTER_REPLAY=<name> let any existing mcporter invocation participate (the runtime constructor wraps each server's transport when set).
  • Replay matching is strict by design. A request that has no matching recv in the recording fails with a clear error naming the request and the next expected recv. No fuzzy matching, no auto-fallback to live — replay is for reproducing exact runs.

Why this matters

When an MCP-backed workflow breaks in production, reproducing the bug means re-running the live MCP server with the same inputs — which is often expensive (Linear quota, Vercel API rate limits) or impossible (the server's state has changed). Today mcporter exposes MCP servers as TypeScript APIs and CLIs but has no way to capture what an agent actually called and what came back.

Three concrete use cases this unlocks:

  • Offline bug reproduction. Record once when the bug happens. Replay it on a laptop without the agent or the live server.
  • Test fixtures from real call sequences. mcporter record session-foo → commit session-foo.ndjson to your test suite. Replay it in CI without network.
  • Postmortem sharing without credentials. Share a recording, not the OAuth tokens that produced it.

Sibling project openclaw/acpx already has a flow trace replay (docs/2026-03-26-acpx-flow-trace-replay.md); this PR brings the same shape to MCP transport.

Demo

Simulated demo:

record/replay demo

The demo shows the full loop: record a Linear MCP call, then replay it deterministically even after the live server becomes unreachable. The NDJSON envelopes carry the _meta direction + server fields the replay transport matches on.

Testing

  • corepack pnpm typecheck
  • corepack pnpm lint (oxlint clean, oxfmt clean)
  • corepack pnpm test — 646 tests pass; new tests/record-replay.test.ts covers:
    • recording writes one NDJSON line per send/recv with _meta.dir and _meta.server populated
    • replaying matches requests by method + params and returns the recorded response
    • mismatch requests throw with a clear error naming the request and the next expected recv
    • multi-server sessions keep streams separated by _meta.server
    • lifecycle events (start, close) are recorded for completeness but ignored on replay

…stically

mcporter record <session> wraps the runtime transport and appends every
JSON-RPC request, response, and notification to a per-session NDJSON file
under ~/.mcporter/recordings/. mcporter replay <session> reconstructs an
in-memory transport from the recording and matches requests by method +
deep-equal params, returning the recorded response without contacting
the live server.

Use cases:
- Reproduce MCP-backed agent bugs offline (no live Linear quota, no
  Vercel API rate limits)
- Build test fixtures from real call sequences
- Share a session for a postmortem without sharing credentials

The format is plain JSON-RPC over NDJSON with a small _meta field
(direction, server, timestamp). No proprietary blob. Env-var passthrough
(MCPORTER_RECORD=<name>, MCPORTER_REPLAY=<name>) lets the existing
runtime constructor wrap any transport when set.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e2460611af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


this.expectedSends.shift();
if (expected.response) {
queueMicrotask(() => this.onmessage?.(expected.response as JSONRPCMessage));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Rewrite replayed response ids to the active request

When the replayed client’s JSON-RPC id counter differs from the original recording (for example, a fixture captured after earlier daemon traffic or replayed from a different call sequence), this emits the recorded response id instead of the id from the request that just matched. JSON-RPC clients correlate responses by id, so the client will ignore the message and the call will hang or time out even though method/params matched; clone the recorded response with the current request id before delivering it.

Useful? React with 👍 / 👎.

@steipete
Copy link
Copy Markdown
Collaborator

Hmmm - not fully sold. What will you use that for? As a tool to build/debug MCP servers?

@mvanhorn
Copy link
Copy Markdown
Contributor Author

Three things, in order of value:

  1. Build/debug MCP servers (yes, your guess) -- record a real session, slow-replay it locally with mcporter replay --speed 0.5 to inspect individual calls. Way faster feedback loop than re-running the live agent against the server every time.
  2. Regression repros -- ship the NDJSON in a bug report so the maintainer can reproduce the exact call sequence without setting up the user's environment. The MCP protocol is stateful and "set up your env to match mine" is a high cost.
  3. Cross-version diffs -- replay the same recording against MCP server v0 and v1 to spot behavior drift. Useful when an MCP server bumps its protocol layer.

Use cases #2 and #3 are the ones I personally lean on; #1 is the one most maintainers will reach for first. Happy to drop any of these if the scope feels off for mcporter.

@steipete
Copy link
Copy Markdown
Collaborator

Hmmmm. It would shift mcporter from sth that is "helps call mcps for agents" more towards a test/debugger tool. I haven't had that need yet so not fully sold. Will think about it.

@clawsweeper clawsweeper Bot added proof: 🎥 video Contributor real behavior proof includes video or recording evidence. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. merge-risk: 🚨 availability 🚨 Merging this PR could cause crashes, hangs, restart loops, stalls, or process outages. labels May 22, 2026
@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 22, 2026

Codex review: needs real behavior proof before merge.

Latest ClawSweeper review: 2026-05-22 06:12 UTC / May 22, 2026, 2:12 AM ET.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR adds mcporter record and mcporter replay commands, docs, tests, and runtime transport wrappers for NDJSON MCP traffic capture and deterministic replay.

Reproducibility: yes. for the blocking replay bug: the PR source and test show a request with id 99 receiving a recorded response with id 1. Product fit is not a reproduction question and remains a maintainer decision.

PR rating
Overall: 🧂 unranked krab
Proof: 🦪 silver shellfish
Patch quality: 🦪 silver shellfish
Summary: The feature is coherent, but the PR is not quality-ready because real behavior proof is insufficient and replay currently has a blocking id-correlation bug.

Rank-up moves:

  • Add redacted real CLI proof for a successful record and replay run.
  • Rewrite replayed response ids to match the active request id and update the regression test.
  • Add clear warning/redaction guidance for raw recording files and get maintainer confirmation on product fit.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Needs stronger real behavior proof before merge: The linked GIF was downloaded and inspected, but it is labeled and appears simulated rather than a real run of the changed CLI; the contributor should add redacted terminal output, a terminal screenshot, or a recording from an actual record/replay run, then update the PR body to trigger a fresh review or ask a maintainer for @clawsweeper re-review.

Risk before merge

  • Merging this before maintainer approval would commit MCPorter to a broader test/debugger-style feature that the current discussion has not accepted.
  • The current replay transport can leave real MCP calls waiting until timeout when the active client request id differs from the recorded response id.
  • Recordings store raw MCP request params and response payloads, so sharing or committing them can expose secrets, private data, or sensitive customer content unless the docs and workflow make redaction explicit.
  • The linked demo is labeled simulated and does not satisfy the real-behavior proof gate for an external non-docs PR.

Maintainer options:

  1. Repair replay and disclosure before merge (recommended)
    Fix JSON-RPC id rewriting and add explicit redaction guidance for raw recordings before any maintainer-approved merge.
  2. Approve the feature direction explicitly
    Maintainers can accept the added debugger/repro surface after the repair work if they decide it fits MCPorter's core vision.
  3. Pause or close as out of scope
    If record/replay is too far toward standalone testing/debugging tooling, keep MCPorter focused on calling, generating, typing, hosting, and diagnosing MCP servers.

Next step before merge
This needs contributor real-behavior proof and maintainer product approval; the code repair is narrow, but automation cannot satisfy the proof or product-decision blockers.

Security
Needs attention: The diff introduces raw local MCP traffic capture without redaction guidance, which is security-sensitive because recordings may be shared as repro artifacts.

Review findings

  • [P1] Rewrite replayed response ids to the active request — src/runtime/replay-transport.ts:47
  • [P2] Warn before presenting raw recordings as shareable — README.md:24
Review details

Best possible solution:

If maintainers want record/replay in core, land a narrower version that rewrites response ids, warns about raw payload sensitivity, and includes redacted real CLI proof; otherwise pause or close it as outside the current product direction.

Do we have a high-confidence way to reproduce the issue?

Yes for the blocking replay bug: the PR source and test show a request with id 99 receiving a recorded response with id 1. Product fit is not a reproduction question and remains a maintainer decision.

Is this the best way to solve the issue?

No: the direction may be useful, but this implementation should not merge until replay rewrites response ids, raw-recording sensitivity is documented, and a maintainer accepts the new record/replay product surface.

Label changes:

  • add P2: This is a normal-priority feature PR with a concrete replay correctness blocker and unresolved product-fit discussion.
  • add merge-risk: 🚨 security-boundary: The feature persists raw MCP params and results while describing recordings as shareable repros, which can expose sensitive data without redaction guidance.
  • add merge-risk: 🚨 availability: The replay transport can deliver a response id the MCP client is not waiting for, causing replayed calls to hang or time out.
  • add proof: 🎥 video: Contributor real behavior proof includes video or recording evidence. The linked GIF was downloaded and inspected, but it is labeled and appears simulated rather than a real run of the changed CLI; the contributor should add redacted terminal output, a terminal screenshot, or a recording from an actual record/replay run, then update the PR body to trigger a fresh review or ask a maintainer for @clawsweeper re-review.
  • add rating: 🧂 unranked krab: Current PR rating is 🧂 unranked krab because proof is 🦪 silver shellfish, patch quality is 🦪 silver shellfish, and The feature is coherent, but the PR is not quality-ready because real behavior proof is insufficient and replay currently has a blocking id-correlation bug.
  • add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The linked GIF was downloaded and inspected, but it is labeled and appears simulated rather than a real run of the changed CLI; the contributor should add redacted terminal output, a terminal screenshot, or a recording from an actual record/replay run, then update the PR body to trigger a fresh review or ask a maintainer for @clawsweeper re-review.

Label justifications:

  • P2: This is a normal-priority feature PR with a concrete replay correctness blocker and unresolved product-fit discussion.
  • merge-risk: 🚨 availability: The replay transport can deliver a response id the MCP client is not waiting for, causing replayed calls to hang or time out.
  • merge-risk: 🚨 security-boundary: The feature persists raw MCP params and results while describing recordings as shareable repros, which can expose sensitive data without redaction guidance.
  • rating: 🧂 unranked krab: Current PR rating is 🧂 unranked krab because proof is 🦪 silver shellfish, patch quality is 🦪 silver shellfish, and The feature is coherent, but the PR is not quality-ready because real behavior proof is insufficient and replay currently has a blocking id-correlation bug.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The linked GIF was downloaded and inspected, but it is labeled and appears simulated rather than a real run of the changed CLI; the contributor should add redacted terminal output, a terminal screenshot, or a recording from an actual record/replay run, then update the PR body to trigger a fresh review or ask a maintainer for @clawsweeper re-review.
  • proof: 🎥 video: Contributor real behavior proof includes video or recording evidence. The linked GIF was downloaded and inspected, but it is labeled and appears simulated rather than a real run of the changed CLI; the contributor should add redacted terminal output, a terminal screenshot, or a recording from an actual record/replay run, then update the PR body to trigger a fresh review or ask a maintainer for @clawsweeper re-review.

Full review comments:

  • [P1] Rewrite replayed response ids to the active request — src/runtime/replay-transport.ts:47
    When a matched replay request uses a different JSON-RPC id than the original recording, this emits the recorded response unchanged. The MCP client is waiting for the active request id, so the call can hang until timeout; clone the response and set its id to the current request id before calling onmessage.
    Confidence: 0.94
  • [P2] Warn before presenting raw recordings as shareable — README.md:24
    The new README bullet describes recordings as shareable repros, but the recorder writes raw JSON-RPC params and results. Those payloads can contain secrets or private user data, so add prominent redaction/sensitivity guidance before recommending sharing or committing recordings.
    Confidence: 0.84

Overall correctness: patch is incorrect
Overall confidence: 0.92

Security concerns:

  • [medium] Raw MCP payloads can expose sensitive data — README.md:24
    Recordings include JSON-RPC request params and response results, and the README calls them shareable repros. Tool arguments and results can contain credentials, tokens, customer data, or private content, so the PR should add clear redaction guidance and avoid implying recordings are safe to share unreviewed.
    Confidence: 0.84

Acceptance criteria:

  • ./runner pnpm exec vitest run tests/record-replay.test.ts
  • ./runner pnpm check
  • ./runner pnpm test

What I checked:

  • No equivalent on current main: Current main has no MCPORTER_RECORD, MCPORTER_REPLAY, or recordings implementation surface; the existing diagnostics cover daemon logging and generator metadata replay, not deterministic MCP JSON-RPC session replay. (0c36a6d3f833)
  • Existing diagnostics are not a duplicate: The current logging docs describe daemon stdout/stderr and per-call trace lines, which help troubleshooting but do not create offline request/response fixtures. (docs/logging.md:11, 0c36a6d3f833)
  • Product fit is unresolved: A collaborator asked what the feature would be used for, the author replied with debugging/repro/diff use cases, and the collaborator then said it may shift MCPorter toward a test/debugger tool and they were not fully sold. (c50060b63f8d)
  • Replay id blocker: ReplayTransport.send() emits the recorded response object unchanged after a method/params match, so the response id remains the recorded id instead of the active request id. (src/runtime/replay-transport.ts:47, c50060b63f8d)
  • Test currently locks in the bad id: The new test sends a replay request with id 99 but expects the received response to keep id 1, confirming the mismatch is in the proposed behavior rather than only a theoretical edge case. (tests/record-replay.test.ts:64, c50060b63f8d)
  • Raw recording sensitivity: The README markets recordings as shareable repros while the docs show raw JSON-RPC params and results being stored; that needs a redaction/sensitivity warning before merge. (README.md:24, c50060b63f8d)

Likely related people:

  • steipete: Authored most current runtime and CLI history in the affected paths and raised the product-fit concern in the PR discussion. (role: runtime/CLI feature owner and current reviewer; confidence: high; commits: 0c36a6d3f833, f2249eb5fb13, 83cc3b9a4cbc; files: src/runtime.ts, src/runtime/transport.ts, src/cli.ts)
  • Sebastian Otaegui: Authored recent no-browser OAuth runtime changes touching the same runtime connection area. (role: recent adjacent runtime transport contributor; confidence: medium; commits: 033abb4358e6; files: src/runtime.ts, src/runtime/transport.ts)
  • Lil Z: Authored managed runtime and keep-alive work that shares the runtime lifecycle and connection-management surface. (role: adjacent daemon/runtime contributor; confidence: medium; commits: 8c66f1c49a68; files: src/runtime.ts, src/runtime/transport.ts, src/cli.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 0c36a6d3f833.

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 22, 2026

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 availability 🚨 Merging this PR could cause crashes, hangs, restart loops, stalls, or process outages. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. P2 Normal priority bug or improvement with limited blast radius. proof: 🎥 video Contributor real behavior proof includes video or recording evidence. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants