LLT-7243: collect Windows crash dumps in nat-lab#1773
Open
gytsto wants to merge 3 commits into
Open
Conversation
4d12770 to
8e74fee
Compare
42056c4 to
215033e
Compare
f7e6102 to
f615f01
Compare
f615f01 to
7cb0684
Compare
7cb0684 to
81ec26f
Compare
81ec26f to
e41a792
Compare
e41a792 to
d684433
Compare
d684433 to
d9dc91f
Compare
Adds Windows crash-dump collection to the libtelio nat-lab pipeline.
Pairs with dockur_windows scripts/enable_crash_dumps.ps1 (MR !161),
which configures WER to write full user-mode dumps to C:\CrashDumps.
* nat-lab/tests/log_collector.py - pulls C:\CrashDumps off the
Windows VM via qemu-guest-agent on test failure, alongside the
existing log capture.
* nat-lab/tests/telio.py - tiny wiring change so the collector is
invoked for Windows clients.
* ci/env.py - pin LIBTELIO_ENV_NAT_LAB_WINDOWS_VM_TAG to the
crash-dumps-enabled dockur_windows image (digest pinned to avoid
cache reuse of older builds). Revert to a tagged dockur_windows
release once one ships with the crash dump provisioning merged.
Switching from the pinned enable-crash-dumps digest to the released v0.0.12 tag. v0.0.12 includes the post-mortem fixes from the windows-installed flake debug (split across dockur_windows !174 and the merged enable_crash_dumps work), so we no longer need to pin the branch-built digest. Tag rather than digest so we follow whatever the v0.0.12 build produces; pin to v0.0.12@sha256:... later once the build pipeline publishes and we want a fully-reproducible reference.
Was v5.15.5; bumping to v5.15.8 for the latest ci-helper-scripts and runner image fixes.
d9dc91f to
64352c6
Compare
| @@ -0,0 +1 @@ | |||
| Collect Windows Error Reporting crash dumps from the Windows nat-lab VM on test teardown | |||
Contributor
There was a problem hiding this comment.
Nit: we can leave the changelog empty here since it doesn't concern the apps teams
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Windows nat-lab tests don't capture crash dumps when a process inside the Windows VM (e.g. tcli, daemon) crashes. The result is a test failure with no post-mortem artifact, forcing manual re-runs with added instrumentation.
Solution
Collect WER local dumps off the Windows VM on test teardown.
Changes
nat-lab/tests/log_collector.py— pullsC:\CrashDumps\*.dmpfrom the Windows VM via qemu-guest-agent alongside the existing log capture.nat-lab/tests/telio.py— minor wiring so the collector runs for Windows clients.ci/env.py— pinsLIBTELIO_ENV_NAT_LAB_WINDOWS_VM_TAGto a dockur_windows image digest that has WER crash dump collection enabled.☑️ Definition of Done checklist