Skip to content

[WIP] Validate and collect npd status in e2e test#197

Closed
Copilot wants to merge 12 commits into
mainfrom
copilot/validate-npd-status-in-e2e-test
Closed

[WIP] Validate and collect npd status in e2e test#197
Copilot wants to merge 12 commits into
mainfrom
copilot/validate-npd-status-in-e2e-test

Conversation

Copilot AI commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Adds joined-node E2E validation for node-problem-detector status and collects node-problem-detector logs during E2E log dumps.

Also updates E2E documentation to mention NPD diagnostics and runs targeted validation/security checks.

@bcho bcho left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot in e2e test, can we also dump logs from npd?

Copilot AI commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

@copilot in e2e test, can we also dump logs from npd?

Added NPD journal collection to the E2E log dump. Each VM now writes ${prefix}-npd.log with node-problem-detector.service logs from the active machine, with host fallback diagnostics. Addressed in d89a631.

Copilot AI changed the title [WIP] Validate npd status in e2e test [WIP] Validate and collect npd status in e2e test Jun 17, 2026
Copilot finished work on behalf of bcho June 17, 2026 00:23
Copilot AI requested a review from bcho June 17, 2026 00:23
@bcho

bcho commented Jun 17, 2026

Copy link
Copy Markdown
Member

Confirmed from the failed E2E run logs that this validation is flushing out the issue mentioned in #193. All Flex nodes joined and reached Ready, and node-problem-detector.service briefly reported active, but NPD was crash-looping before it could publish node conditions. The collected NPD journals repeatedly show:

panic: stat /var/lib/kubelet/kubelet/kubeconfig: no such file or directory

So the failure is caused by the bad kubeconfig path currently passed to NPD, not by node join or the detector rules themselves. Once #193 is combined with this PR, we expect the NPD condition check to pass and validate the fix.

@bcho

bcho commented Jun 17, 2026

Copy link
Copy Markdown
Member

included in #198

@bcho bcho closed this Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants