Add Longhorn storage health probe for read-only mount failures and disk pressure by Copilot · Pull Request #2706 · jeffborg/home-network

Copilot · 2026-05-26T00:36:15Z

Pods can continue running while Longhorn-backed mounts become read-only after filesystem errors; restarting workloads may recover but there was no proactive signal. This change adds cluster-side detection for read-only mount symptoms and Longhorn disk-capacity pressure.

What changed
- Added cluster/core/longhorn-system/storage-health-check.yaml with:
  - ServiceAccount + scoped ClusterRole/ClusterRoleBinding
  - CronJob (longhorn-storage-health-check) on a 15-minute cadence
- Wired the new manifest into cluster/core/longhorn-system/kustomization.yaml.
Detection coverage
- Warning events containing read-only filesystem indicators (read-only file system, erofs, remount-ro)
- Pod container waiting/termination messages containing read-only filesystem errors
- Longhorn volumes in unhealthy states (e.g., faulted, or attached but not healthy)
- Longhorn disk usage above threshold (currently >=85% used)
Operational behavior
- Emits targeted findings to job logs.
- Exits non-zero when any condition is detected, creating an explicit failure signal for visibility/alerting pipelines.

spec:
  schedule: "*/15 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: check
              command:
                - /bin/sh
                - -ec
                - |
                  # flags read-only indicators + Longhorn unhealthy volumes + >=85% disk usage
                  ...
                  if [ "${issues}" -ne 0 ]; then exit 1; fi

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

charts.fairwinds.com
- Triggering command: /usr/local/bin/helm helm repo add fairwinds-stable REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Agent-Logs-Url: https://github.com/jeffborg/home-network/sessions/44b758de-6369-4fa8-a65e-4042780c8d01 Co-authored-by: jeffborg <1595430+jeffborg@users.noreply.github.com>

jeffborg · 2026-05-26T01:44:50Z

@copilot is this disk usage within the longhorn volumes themselves?
also where do the notifications go to?

Copilot · 2026-05-26T01:46:04Z

@copilot is this disk usage within the longhorn volumes themselves?
also where do the notifications go to?

Not currently inside each mounted volume filesystem. The check uses nodes.longhorn.io diskStatus (storageMaximum vs storageAvailable), so it measures Longhorn backing-disk capacity on each Longhorn node/disk.
Right now this CronJob does not send external notifications by itself; it only fails the Job and logs findings. So visibility is in Job status/logs (and any cluster-wide alerting you already have on failed Jobs/CronJobs). The existing Flux Slack alert in this repo is for Flux resources in flux-system, not this longhorn-system CronJob.

Initial plan

d6f50ad

Copilot AI assigned Copilot and jeffborg May 26, 2026

Copilot started work on behalf of jeffborg May 26, 2026 00:36 View session

Copilot AI linked an issue May 26, 2026 that may be closed by this pull request

need to detect pods where the mounted storage becomes read only due to error #2705

Open

Add Longhorn storage health check CronJob

47f2697

Agent-Logs-Url: https://github.com/jeffborg/home-network/sessions/44b758de-6369-4fa8-a65e-4042780c8d01 Co-authored-by: jeffborg <1595430+jeffborg@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add detection for read-only mounted storage in pods~~ Add Longhorn storage health probe for read-only mount failures and disk pressure May 26, 2026

Copilot AI requested a review from jeffborg May 26, 2026 00:41

Copilot finished work on behalf of jeffborg May 26, 2026 00:41

Copilot started work on behalf of jeffborg May 26, 2026 01:44 View session

Copilot finished work on behalf of jeffborg May 26, 2026 01:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Longhorn storage health probe for read-only mount failures and disk pressure#2706

Add Longhorn storage health probe for read-only mount failures and disk pressure#2706
Copilot wants to merge 2 commits into
masterfrom
copilot/detect-readonly-mounted-storage

Copilot AI commented May 26, 2026 •

edited

Loading

Uh oh!

jeffborg commented May 26, 2026

Uh oh!

Copilot AI commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

jeffborg commented May 26, 2026

Uh oh!

Copilot AI commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented May 26, 2026 •

edited

Loading