feat(component-manager): wire compute tray management through RMS bac…#2418
feat(component-manager): wire compute tray management through RMS bac…#2418vinodchitraliNVIDIA wants to merge 1 commit into
Conversation
Summary by CodeRabbit
WalkthroughAdds RMS compute-tray support: DB query to map BMC IPs to machine identities, enables RmsBackend as a compute-tray backend, implements ComputeTrayManager (power and firmware flows with job tracking), and provides test fixtures and async SQLx tests. ChangesCompute-Tray RMS Backend
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
crates/component-manager/src/test_support.rs (1)
43-52: 💤 Low valueConsider extracting common hash generation logic (optional).
The hash padding logic is duplicated across
test_power_shelf_id,test_machine_id, andtest_switch_id. A small helper could reduce repetition:fn label_to_hash(label: &str) -> [u8; 32] { let mut hash = [0u8; 32]; let bytes = label.as_bytes(); hash[..bytes.len().min(32)].copy_from_slice(&bytes[..bytes.len().min(32)]); hash }That said, given the simplicity and isolated test context, the current explicit approach is defensible per the project's preference for simple, explicit code over abstractions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/component-manager/src/test_support.rs` around lines 43 - 52, Duplicate hash-padding logic appears in test_machine_id, test_power_shelf_id, and test_switch_id; extract it into a small helper e.g. label_to_hash(label: &str) -> [u8; 32] that creates the zeroed 32-byte array, copies up to 32 bytes of label.as_bytes(), and returns the array, then update test_machine_id, test_power_shelf_id and test_switch_id to call this helper instead of repeating the same code.crates/component-manager/src/component_manager.rs (1)
147-164: ⚡ Quick winAdd explicit unit coverage for the new compute-tray RMS branch.
Please add tests that validate: (1)
Backend::Rmsfor compute tray errors whenrms_clientis missing, (2) errors whendbis missing, and (3) succeeds when both are present. This path is newly enabled and worth locking down.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/component-manager/src/component_manager.rs` around lines 147 - 164, Add unit tests for the compute_tray RMS branch that exercise the match arm creating compute_tray: write three tests targeting ComponentManager::new (or the constructor that contains the snippet) with config.compute_tray_backend = Backend::Rms — (1) build the inputs with rms_client = None and a valid db to assert it returns Err(ComponentManagerError::InvalidArgument) and the message mentions "RMS client is not configured", (2) rms_client = Some(...) but db = None to assert Err(ComponentManagerError::InvalidArgument) with message mentioning "database pool is not configured", and (3) provide both rms_client and db to assert success (Ok) and that compute_tray is an Arc containing a RmsBackend (or that construction succeeds). Use the same types referenced in the code (Backend::Rms, ComponentManagerError::InvalidArgument, crate::rms::RmsBackend::new, compute_tray) and keep tests isolated/mocked as needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/api-db/src/machine.rs`:
- Around line 2599-2620: find_rms_identities_by_bmc_ips currently can return
multiple MachineRmsIdentity rows for the same BMC IP (mia.address); change the
function so after executing the query (sql which selects mia.address AS bmc_ip)
you detect duplicate bmc_ip values and return a DatabaseError when any
duplicates exist instead of returning a Vec with ambiguous entries. Locate the
async fn find_rms_identities_by_bmc_ips and after
sqlx::query_as(...).fetch_all(db).await, validate the resulting
Vec<MachineRmsIdentity> for unique bmc_ip (or host(mia.address)) values; if any
IP appears more than once, map that condition to
DatabaseError::new("machine::find_rms_identities_by_bmc_ips", ...) with a clear
message about duplicate BMC IPs, otherwise return the vector as before.
---
Nitpick comments:
In `@crates/component-manager/src/component_manager.rs`:
- Around line 147-164: Add unit tests for the compute_tray RMS branch that
exercise the match arm creating compute_tray: write three tests targeting
ComponentManager::new (or the constructor that contains the snippet) with
config.compute_tray_backend = Backend::Rms — (1) build the inputs with
rms_client = None and a valid db to assert it returns
Err(ComponentManagerError::InvalidArgument) and the message mentions "RMS client
is not configured", (2) rms_client = Some(...) but db = None to assert
Err(ComponentManagerError::InvalidArgument) with message mentioning "database
pool is not configured", and (3) provide both rms_client and db to assert
success (Ok) and that compute_tray is an Arc containing a RmsBackend (or that
construction succeeds). Use the same types referenced in the code (Backend::Rms,
ComponentManagerError::InvalidArgument, crate::rms::RmsBackend::new,
compute_tray) and keep tests isolated/mocked as needed.
In `@crates/component-manager/src/test_support.rs`:
- Around line 43-52: Duplicate hash-padding logic appears in test_machine_id,
test_power_shelf_id, and test_switch_id; extract it into a small helper e.g.
label_to_hash(label: &str) -> [u8; 32] that creates the zeroed 32-byte array,
copies up to 32 bytes of label.as_bytes(), and returns the array, then update
test_machine_id, test_power_shelf_id and test_switch_id to call this helper
instead of repeating the same code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: adb3a3c8-47e6-40ed-b402-aa03ed62f76f
📒 Files selected for processing (4)
crates/api-db/src/machine.rscrates/component-manager/src/component_manager.rscrates/component-manager/src/rms.rscrates/component-manager/src/test_support.rs
…kend Enable `compute_tray_backend = "rms"` so compute tray power and firmware operations use the same direct RMS path as switches and power shelves, instead of returning a hard error from the component manager factory. Add `MachineRmsIdentity` and `find_rms_identities_by_bmc_ips()` to resolve machine node_id, rack_id, and BMC MAC from BMC IP via machine_interfaces. Implement `ComputeTrayManager for RmsBackend` with BatchSetPowerState, ApplyFirmwareObject (NodeType::Compute), firmware job tracking, and bundle listing. Disambiguate existing RMS tests that called trait methods directly on `RmsBackend` now that multiple managers share the same type. Signed-off-by: Vinod Chitrali <vchitrali@nvidia.com>
8a81e86 to
b960196
Compare
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/component-manager/src/component_manager.rs (1)
189-269: ⚡ Quick winConsider adding factory validation tests for RMS compute-tray backend.
The test suite does not exercise the new validation logic introduced for
Backend::Rms(lines 149-158). Adding tests for missingrms_client, missingdb, and successful RMS backend construction would improve coverage and document expected behavior. The same gap exists for nv_switch and power_shelf RMS validation—recommend covering all three for consistency.♻️ Example test structure
#[tokio::test] async fn build_rms_compute_tray_without_client_returns_error() { let config = ComponentManagerConfig { nv_switch_backend: "mock".into(), power_shelf_backend: "mock".into(), compute_tray_backend: Backend::Rms, ..Default::default() }; // Pass db but no rms_client let err = build_component_manager(&config, None, None, Some(mock_pool()), None) .await .unwrap_err(); assert!(matches!(err, ComponentManagerError::InvalidArgument(msg) if msg.contains("RMS client is not configured"))); } #[tokio::test] async fn build_rms_compute_tray_without_db_returns_error() { let config = ComponentManagerConfig { nv_switch_backend: "mock".into(), power_shelf_backend: "mock".into(), compute_tray_backend: Backend::Rms, ..Default::default() }; // Pass rms_client but no db let err = build_component_manager(&config, Some(mock_rms_client()), None, None, None) .await .unwrap_err(); assert!(matches!(err, ComponentManagerError::InvalidArgument(msg) if msg.contains("database pool is not configured"))); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/component-manager/src/component_manager.rs` around lines 189 - 269, The tests don't cover the new RMS factory validation for Backend::Rms in build_component_manager; add unit tests that exercise compute_tray (and similarly nv_switch and power_shelf) when Backend::Rms is selected: one test passing a db but no rms_client to assert build_component_manager returns ComponentManagerError::InvalidArgument with a message about the missing RMS client, one test passing an rms_client but no db to assert an InvalidArgument about the missing database pool, and one test providing both mock_rms_client and mock_db to assert successful construction and correct component names for compute_tray (and mirror these three cases for nv_switch and power_shelf as appropriate). Ensure tests construct ComponentManagerConfig with compute_tray_backend: Backend::Rms and call build_component_manager(...).await.unwrap()/unwrap_err() as needed and match error messages.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/component-manager/src/component_manager.rs`:
- Around line 189-269: The tests don't cover the new RMS factory validation for
Backend::Rms in build_component_manager; add unit tests that exercise
compute_tray (and similarly nv_switch and power_shelf) when Backend::Rms is
selected: one test passing a db but no rms_client to assert
build_component_manager returns ComponentManagerError::InvalidArgument with a
message about the missing RMS client, one test passing an rms_client but no db
to assert an InvalidArgument about the missing database pool, and one test
providing both mock_rms_client and mock_db to assert successful construction and
correct component names for compute_tray (and mirror these three cases for
nv_switch and power_shelf as appropriate). Ensure tests construct
ComponentManagerConfig with compute_tray_backend: Backend::Rms and call
build_component_manager(...).await.unwrap()/unwrap_err() as needed and match
error messages.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: aab7f3e8-e559-4abe-8344-808d469394d2
📒 Files selected for processing (4)
crates/api-db/src/machine.rscrates/component-manager/src/component_manager.rscrates/component-manager/src/rms.rscrates/component-manager/src/test_support.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- crates/api-db/src/machine.rs
- crates/component-manager/src/test_support.rs
- crates/component-manager/src/rms.rs
…kend
Enable
compute_tray_backend = "rms"so compute tray power and firmware operations use the same direct RMS path as switches and power shelves, instead of returning a hard error from the component manager factory.Add
MachineRmsIdentityandfind_rms_identities_by_bmc_ips()to resolve machine node_id, rack_id, and BMC MAC from BMC IP via machine_interfaces. ImplementComputeTrayManager for RmsBackendwith BatchSetPowerState, ApplyFirmwareObject (NodeType::Compute), firmware job tracking, and bundle listing. Disambiguate existing RMS tests that called trait methods directly onRmsBackendnow that multiple managers share the same type.Description
Type of Change
Related Issues (Optional)
#2393
Breaking Changes
Testing
Additional Notes