Skip to content

LLT-7243: collect Windows crash dumps in nat-lab#1773

Open
gytsto wants to merge 3 commits into
mainfrom
LLT-7243_enable_windows_crashdump_collection
Open

LLT-7243: collect Windows crash dumps in nat-lab#1773
gytsto wants to merge 3 commits into
mainfrom
LLT-7243_enable_windows_crashdump_collection

Conversation

@gytsto
Copy link
Copy Markdown
Contributor

@gytsto gytsto commented May 18, 2026

Problem

Windows nat-lab tests don't capture crash dumps when a process inside the Windows VM (e.g. tcli, daemon) crashes. The result is a test failure with no post-mortem artifact, forcing manual re-runs with added instrumentation.

Solution

Collect WER local dumps off the Windows VM on test teardown.

Changes

  • nat-lab/tests/log_collector.py — pulls C:\CrashDumps\*.dmp from the Windows VM via qemu-guest-agent alongside the existing log capture.
  • nat-lab/tests/telio.py — minor wiring so the collector runs for Windows clients.
  • ci/env.py — pins LIBTELIO_ENV_NAT_LAB_WINDOWS_VM_TAG to a dockur_windows image digest that has WER crash dump collection enabled.

☑️ Definition of Done checklist

  • Commit history is clean (requirements)
  • README.md is updated
  • Functionality is covered by unit or integration tests

@gytsto gytsto self-assigned this May 18, 2026
@gytsto gytsto requested a review from a team as a code owner May 18, 2026 11:43
@gytsto gytsto force-pushed the LLT-7243_enable_windows_crashdump_collection branch from 4d12770 to 8e74fee Compare May 22, 2026 10:11
@gytsto gytsto force-pushed the LLT-7243_enable_windows_crashdump_collection branch from 42056c4 to 215033e Compare May 23, 2026 16:09
@gytsto gytsto force-pushed the LLT-7243_enable_windows_crashdump_collection branch from f7e6102 to f615f01 Compare May 24, 2026 11:27
@gytsto gytsto changed the title enable windows crash dumps LLT-7243: collect Windows crash dumps in nat-lab May 24, 2026
@gytsto gytsto force-pushed the LLT-7243_enable_windows_crashdump_collection branch from f615f01 to 7cb0684 Compare May 24, 2026 11:35
@gytsto gytsto force-pushed the LLT-7243_enable_windows_crashdump_collection branch from 7cb0684 to 81ec26f Compare May 24, 2026 15:12
@gytsto gytsto force-pushed the LLT-7243_enable_windows_crashdump_collection branch from 81ec26f to e41a792 Compare May 24, 2026 19:29
@gytsto gytsto force-pushed the LLT-7243_enable_windows_crashdump_collection branch from e41a792 to d684433 Compare May 24, 2026 21:40
@gytsto gytsto force-pushed the LLT-7243_enable_windows_crashdump_collection branch from d684433 to d9dc91f Compare May 28, 2026 11:18
gytsto added 2 commits May 28, 2026 14:23
Adds Windows crash-dump collection to the libtelio nat-lab pipeline.
Pairs with dockur_windows scripts/enable_crash_dumps.ps1 (MR !161),
which configures WER to write full user-mode dumps to C:\CrashDumps.

  * nat-lab/tests/log_collector.py - pulls C:\CrashDumps off the
    Windows VM via qemu-guest-agent on test failure, alongside the
    existing log capture.
  * nat-lab/tests/telio.py - tiny wiring change so the collector is
    invoked for Windows clients.
  * ci/env.py - pin LIBTELIO_ENV_NAT_LAB_WINDOWS_VM_TAG to the
    crash-dumps-enabled dockur_windows image (digest pinned to avoid
    cache reuse of older builds). Revert to a tagged dockur_windows
    release once one ships with the crash dump provisioning merged.
Switching from the pinned enable-crash-dumps digest to the released
v0.0.12 tag. v0.0.12 includes the post-mortem fixes from the
windows-installed flake debug (split across dockur_windows !174 and
the merged enable_crash_dumps work), so we no longer need to pin
the branch-built digest.

Tag rather than digest so we follow whatever the v0.0.12 build
produces; pin to v0.0.12@sha256:... later once the build pipeline
publishes and we want a fully-reproducible reference.
Was v5.15.5; bumping to v5.15.8 for the latest ci-helper-scripts and
runner image fixes.
@gytsto gytsto force-pushed the LLT-7243_enable_windows_crashdump_collection branch from d9dc91f to 64352c6 Compare May 28, 2026 11:24
Copy link
Copy Markdown
Contributor

@mathiaspeters mathiaspeters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I'm wondering if we (as a separate ticket) should create a dump_collector.py file where we put the dump-related code, since the log collector is becoming a bit bigger with mixed log/dump code

@@ -0,0 +1 @@
Collect Windows Error Reporting crash dumps from the Windows nat-lab VM on test teardown
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we can leave the changelog empty here since it doesn't concern the apps teams

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants