Fix nvme instance storage discovery#407
Merged
neddp merged 13 commits intoApr 30, 2026
Merged
Conversation
957b389 to
6087318
Compare
e7d00b4 to
dcd857a
Compare
6087318 to
b79e5c1
Compare
b79e5c1 to
6ab6cd7
Compare
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
neddp
approved these changes
Apr 30, 2026
bb98869
into
cloudfoundry:fix-nvme-instance-storage-discovery
8 checks passed
neddp
added a commit
that referenced
this pull request
Jun 12, 2026
…nks (#396) * Implement runtime NVMe instance storage discovery using EBS symlinks * Fix nvme instance storage discovery (#400) * Refactor instance storage discovery into configurable component Implement auto-detection for instance storage disk type * Fix windows tests * Fix windows tests (but for real this time) * Remove leftover path normalization (#401) * Fix nvme instance storage discovery (#407) * Make implementation iaas-agnostic * Rename storage resolver files * Fix tests * Remove instance storage resolver * Don't use the aws pattern as default * Refactor NVMe instance storage discovery and remove unused symlink patterns * Enhance NVMe instance storage discovery with managed volume pattern support * Fix unit tests * Don't run windows unit tests when not supported * Simplify FakeDevicePathResolver by removing unused fields and methods * Wait for udev to settle before resolving EBS symlinks * Add debug logs * Import udev and add comment about why it's needed * Fix missing closing bracket * Update infrastructure/devicepathresolver/symlink_device_resolver.go Co-authored-by: Ivaylo Ivanov <ivaylogi98@gmail.com> * Update infrastructure/devicepathresolver/symlink_device_resolver_test.go Co-authored-by: Ivaylo Ivanov <ivaylogi98@gmail.com> * Fix lint identation * Fix: skip unresolvable symlinks instead of returning error ResolveSymlinksToDevices now logs a warning and continues when a symlink cannot be resolved (e.g. stale/broken symlinks in /dev/disk/by-id/). This prevents unnecessary deploy failures while the count validation in discoverNVMeInstanceStorage still catches any real mismatches. Co-authored-by: Ivaylo Ivanov <ivaylogi98@gmail.com> * Use the already constructed udev instance Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix: use %+v instead of %s for DiskSettings in error message DiskSettings does not implement fmt.Stringer, so using %s produced malformed %!s(...) output. Switch to %+v for actionable error messages. * fix: handle timedOut in discoverIdentityInstanceStorage * resolver: fail hard on unresolvable symlinks in ResolveSymlinksToDevices Silently skipping a symlink that cannot be resolved leaves the managed device exclusion set incomplete. If an EBS volume's by-id symlink is broken, FilterDevices would not exclude it and the device could be misidentified as instance storage, potentially causing data loss. Return a wrapped error instead of continuing, so callers can propagate the failure rather than proceeding with a partial exclusion set. * test: update linux_platform test for hard-fail on broken symlinks The platform-level test still expected the old skip-and-continue behavior. Updated to assert that a broken managed volume symlink returns an error. * fix: prioritize timedOut over err; assert nil result on resolver error - discoverIdentityInstanceStorage: check timedOut before err so the explicit timeout error is never shadowed when both are set - symlink_device_resolver_test: assert result is nil on failure to verify the fail-closed contract --------- Co-authored-by: Ivaylo Ivanov <ivaylogi98@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
neddp
added a commit
that referenced
this pull request
Jun 14, 2026
…nks (#396) * Implement runtime NVMe instance storage discovery using EBS symlinks * Fix nvme instance storage discovery (#400) * Refactor instance storage discovery into configurable component Implement auto-detection for instance storage disk type * Fix windows tests * Fix windows tests (but for real this time) * Remove leftover path normalization (#401) * Fix nvme instance storage discovery (#407) * Make implementation iaas-agnostic * Rename storage resolver files * Fix tests * Remove instance storage resolver * Don't use the aws pattern as default * Refactor NVMe instance storage discovery and remove unused symlink patterns * Enhance NVMe instance storage discovery with managed volume pattern support * Fix unit tests * Don't run windows unit tests when not supported * Simplify FakeDevicePathResolver by removing unused fields and methods * Wait for udev to settle before resolving EBS symlinks * Add debug logs * Import udev and add comment about why it's needed * Fix missing closing bracket * Update infrastructure/devicepathresolver/symlink_device_resolver.go Co-authored-by: Ivaylo Ivanov <ivaylogi98@gmail.com> * Update infrastructure/devicepathresolver/symlink_device_resolver_test.go Co-authored-by: Ivaylo Ivanov <ivaylogi98@gmail.com> * Fix lint identation * Fix: skip unresolvable symlinks instead of returning error ResolveSymlinksToDevices now logs a warning and continues when a symlink cannot be resolved (e.g. stale/broken symlinks in /dev/disk/by-id/). This prevents unnecessary deploy failures while the count validation in discoverNVMeInstanceStorage still catches any real mismatches. Co-authored-by: Ivaylo Ivanov <ivaylogi98@gmail.com> * Use the already constructed udev instance Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix: use %+v instead of %s for DiskSettings in error message DiskSettings does not implement fmt.Stringer, so using %s produced malformed %!s(...) output. Switch to %+v for actionable error messages. * fix: handle timedOut in discoverIdentityInstanceStorage * resolver: fail hard on unresolvable symlinks in ResolveSymlinksToDevices Silently skipping a symlink that cannot be resolved leaves the managed device exclusion set incomplete. If an EBS volume's by-id symlink is broken, FilterDevices would not exclude it and the device could be misidentified as instance storage, potentially causing data loss. Return a wrapped error instead of continuing, so callers can propagate the failure rather than proceeding with a partial exclusion set. * test: update linux_platform test for hard-fail on broken symlinks The platform-level test still expected the old skip-and-continue behavior. Updated to assert that a broken managed volume symlink returns an error. * fix: prioritize timedOut over err; assert nil result on resolver error - discoverIdentityInstanceStorage: check timedOut before err so the explicit timeout error is never shadowed when both are set - symlink_device_resolver_test: assert result is nil on failure to verify the fail-closed contract --------- Co-authored-by: Ivaylo Ivanov <ivaylogi98@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refactor NVMe instance storage discovery to be IaaS-agnostic
Refactors #396 to address review feedback from @rkoster.
What changed from #396
IaaS-agnostic design — the AWS-specific awsNVMeInstanceStorageResolver and its hardcoded EBS pattern are replaced by a generic SymlinkDeviceResolver in infrastructure/devicepathresolver. The managed volume
pattern (/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_) and device pattern (/dev/nvmen1) are passed in entirely via agent config (InstanceStorageManagedVolumePattern, InstanceStorageDevicePattern), set by
the stemcell builder's bosh_aws_agent_settings stage — not hardcoded in the resolver.
udev race condition fixed — the original PR globbed /dev/disk/by-id/ symlinks without waiting for udev to finish creating them. On fast boots, this returned 0 EBS symlinks, causing all NVMe devices (including
the root EBS volume) to be misidentified as instance storage. udevadm trigger + udevadm settle are now called before globbing.
Unit tests fixed — all compilation errors and test failures from #396 are resolved.
Algorithm
Verified using m6id.large
Enabled
raw_instance_storage: true3 NVMe devices present:
Used Claude Code