Skip to content

VCR WIP [do not merge]#297

Draft
MOZGIII wants to merge 8 commits into
mainfrom
mzg/2026-03-26/vcr
Draft

VCR WIP [do not merge]#297
MOZGIII wants to merge 8 commits into
mainfrom
mzg/2026-03-26/vcr

Conversation

@MOZGIII

@MOZGIII MOZGIII commented Mar 30, 2026

Copy link
Copy Markdown
Collaborator

This is a PR to make the work I've been doing on the replay feature support on the workers side visible.
The development here is on hold while we're investigating the prod issues.


The basic idea of the implementation is as follows (and some notes):

  1. add two new big subsystems: vcr recorder and vcr playback; recroder to capture the workloads off of waymark-start-workers, and playback to inject the workloads into waymark-start-workers, replacing the real workers pool but using the real postgres backend (just queue instances from the pre-recorded file);
  2. use traits and proper abstraction layers rather than adding the code directly into runloop, or postgres backend; this is for architecture hygiene reasons, but also because the actual goal is to be able to the the underlying code behind those abstraction layers - so it should be as-is now and in in perpetuity for this feature to make sense; as in if we cut corners and add special-casing for it as the backend layer it would defeat the purpose of this work;
  3. there are some supporting crates - like wamark-vcr-file to provides the shared implementation of reading and writing the vcr files, and well as the vcr files format;
  4. the worker pool and backend trait implementation are provided in the separate crates from the waymark-vcr-recorder and waymark-vcr-playback crates in order to keep the integration layer with other major subsystems explicitly lightweight, with a small and limited scope that is easy to test;
  5. during recording we try to keep track of instances and their actions - we want to group the actions by instances so that when we replay we can just loop through the instances and load the actions for each corresponding instance locally, thus avoiding big jumps; there is an unresolved issue with capturing the workflow versions and dags - we don't want to keep a copy of the dag for each of the instances.
  6. another issue is with correlating the recorded actions (executions) with the replay executions; we want to be able to match the actions we have recorded with the actions for each executor, and the easiest way of doing this would be via correlating the execution ID - but those are generated on-the-fly when the node is added for execution, so this is unsolved for now and might require some more work to either make those IDs be computed deterministically (i.e. from an instance id, graph node id, iteration of the loop unwinding, and an attempt number) and to have more type separation to distinguish between the node and the execution id.
  7. a general refactor of QueuedInstance type and associated backend/executor operations would simplify building this feature, but not a blocker.

UPD: added numbering so it's easier to reference items in discussion.

@MOZGIII MOZGIII force-pushed the mzg/2026-03-26/vcr branch 7 times, most recently from 6e2bde6 to 97c110d Compare April 10, 2026 11:42
@MOZGIII MOZGIII changed the title Replay support in workers WIP [do not merge] VCR WIP [do not merge] Apr 10, 2026
@MOZGIII MOZGIII force-pushed the mzg/2026-03-26/vcr branch 5 times, most recently from 02b709a to 53f4216 Compare April 13, 2026 06:36
@github-actions

Copy link
Copy Markdown

Coverage Report

Python Coverage

Metric Coverage
Lines 72.0%
Branches 58.0%

Download HTML Report

Rust Coverage

Metric Coverage
Lines 64.5% 🔴 (-1.6%)
Branches N/A

Download HTML Report

Compared to main branch

@MOZGIII MOZGIII force-pushed the mzg/2026-03-26/vcr branch 6 times, most recently from 59c13e0 to 8731b74 Compare April 14, 2026 11:03
@MOZGIII MOZGIII force-pushed the mzg/2026-03-26/vcr branch 4 times, most recently from 6b16136 to bc0ce87 Compare April 15, 2026 17:28
@MOZGIII MOZGIII force-pushed the mzg/2026-03-26/vcr branch from bc0ce87 to 328a072 Compare April 15, 2026 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant