Skip to content

feat(component-manager): wire compute tray management through RMS bac…#2418

Open
vinodchitraliNVIDIA wants to merge 1 commit into
NVIDIA:mainfrom
vinodchitraliNVIDIA:vc/cm_wiring
Open

feat(component-manager): wire compute tray management through RMS bac…#2418
vinodchitraliNVIDIA wants to merge 1 commit into
NVIDIA:mainfrom
vinodchitraliNVIDIA:vc/cm_wiring

Conversation

@vinodchitraliNVIDIA

Copy link
Copy Markdown
Contributor

…kend

Enable compute_tray_backend = "rms" so compute tray power and firmware operations use the same direct RMS path as switches and power shelves, instead of returning a hard error from the component manager factory.

Add MachineRmsIdentity and find_rms_identities_by_bmc_ips() to resolve machine node_id, rack_id, and BMC MAC from BMC IP via machine_interfaces. Implement ComputeTrayManager for RmsBackend with BatchSetPowerState, ApplyFirmwareObject (NodeType::Compute), firmware job tracking, and bundle listing. Disambiguate existing RMS tests that called trait methods directly on RmsBackend now that multiple managers share the same type.

Description

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

#2393

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@vinodchitraliNVIDIA vinodchitraliNVIDIA requested a review from a team as a code owner June 10, 2026 20:22
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Summary by CodeRabbit

  • New Features

    • RMS backend now supports compute-tray management, enabling power control and firmware updates for compute trays alongside existing power-shelf and switch support.
  • Tests

    • Added test fixtures and database helpers for compute-tray backend validation.

Walkthrough

Adds RMS compute-tray support: DB query to map BMC IPs to machine identities, enables RmsBackend as a compute-tray backend, implements ComputeTrayManager (power and firmware flows with job tracking), and provides test fixtures and async SQLx tests.

Changes

Compute-Tray RMS Backend

Layer / File(s) Summary
Database contract and compute-tray identity resolution
crates/api-db/src/machine.rs
MachineRmsIdentity captures machine id, BMC IP, BMC MAC, and optional rack id. find_rms_identities_by_bmc_ips resolves identities via joins and enforces unique BMC IP mappings.
RMS backend wiring and compute-tray enablement
crates/component-manager/src/component_manager.rs, crates/component-manager/src/rms.rs
build_component_manager now constructs RmsBackend for Backend::Rms compute-tray option. Adds compute-tray RMS constants, DB resolution path (resolve_compute_tray_identities), and BMC port config.
ComputeTrayManager trait implementation for RmsBackend
crates/component-manager/src/rms.rs
Implements ComputeTrayManager on RmsBackend: power_control via BatchSetPowerState, update_firmware via apply_firmware_object flow with tracked job insertion keyed by BMC MAC, get_firmware_status polling tracked jobs, component filter mapping, and build_compute_tray_node_info producing BMC-only NodeInfo.
Test fixture infrastructure and helpers
crates/component-manager/src/test_support.rs
Adds test_machine_id for deterministic MachineId generation, CT MAC/IP constants, and seed_machine to create expected machine rows, admin segment selection, static BMC interface, and return MachineId for tests.
Compute-tray and test updates
crates/component-manager/src/rms.rs, crates/component-manager/src/test_support.rs
Adds unit test for component-filter mapping, helpers make_compute_tray_backend/make_ct_endpoint, async SQLx tests for compute-tray power control, firmware update/job tracking, and firmware status polling; updates power-shelf firmware tests to call PowerShelfManager::update_firmware directly.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main objective: wiring compute tray management through the RMS backend, which is the central focus of the changeset across all modified files.
Description check ✅ Passed The description is directly related to the changeset, detailing the implementation of compute tray RMS support, machine identity resolution, and the ComputeTrayManager trait implementation with specific operational details.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@vinodchitraliNVIDIA

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
crates/component-manager/src/test_support.rs (1)

43-52: 💤 Low value

Consider extracting common hash generation logic (optional).

The hash padding logic is duplicated across test_power_shelf_id, test_machine_id, and test_switch_id. A small helper could reduce repetition:

fn label_to_hash(label: &str) -> [u8; 32] {
    let mut hash = [0u8; 32];
    let bytes = label.as_bytes();
    hash[..bytes.len().min(32)].copy_from_slice(&bytes[..bytes.len().min(32)]);
    hash
}

That said, given the simplicity and isolated test context, the current explicit approach is defensible per the project's preference for simple, explicit code over abstractions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/component-manager/src/test_support.rs` around lines 43 - 52, Duplicate
hash-padding logic appears in test_machine_id, test_power_shelf_id, and
test_switch_id; extract it into a small helper e.g. label_to_hash(label: &str)
-> [u8; 32] that creates the zeroed 32-byte array, copies up to 32 bytes of
label.as_bytes(), and returns the array, then update test_machine_id,
test_power_shelf_id and test_switch_id to call this helper instead of repeating
the same code.
crates/component-manager/src/component_manager.rs (1)

147-164: ⚡ Quick win

Add explicit unit coverage for the new compute-tray RMS branch.

Please add tests that validate: (1) Backend::Rms for compute tray errors when rms_client is missing, (2) errors when db is missing, and (3) succeeds when both are present. This path is newly enabled and worth locking down.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/component-manager/src/component_manager.rs` around lines 147 - 164,
Add unit tests for the compute_tray RMS branch that exercise the match arm
creating compute_tray: write three tests targeting ComponentManager::new (or the
constructor that contains the snippet) with config.compute_tray_backend =
Backend::Rms — (1) build the inputs with rms_client = None and a valid db to
assert it returns Err(ComponentManagerError::InvalidArgument) and the message
mentions "RMS client is not configured", (2) rms_client = Some(...) but db =
None to assert Err(ComponentManagerError::InvalidArgument) with message
mentioning "database pool is not configured", and (3) provide both rms_client
and db to assert success (Ok) and that compute_tray is an Arc containing a
RmsBackend (or that construction succeeds). Use the same types referenced in the
code (Backend::Rms, ComponentManagerError::InvalidArgument,
crate::rms::RmsBackend::new, compute_tray) and keep tests isolated/mocked as
needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/api-db/src/machine.rs`:
- Around line 2599-2620: find_rms_identities_by_bmc_ips currently can return
multiple MachineRmsIdentity rows for the same BMC IP (mia.address); change the
function so after executing the query (sql which selects mia.address AS bmc_ip)
you detect duplicate bmc_ip values and return a DatabaseError when any
duplicates exist instead of returning a Vec with ambiguous entries. Locate the
async fn find_rms_identities_by_bmc_ips and after
sqlx::query_as(...).fetch_all(db).await, validate the resulting
Vec<MachineRmsIdentity> for unique bmc_ip (or host(mia.address)) values; if any
IP appears more than once, map that condition to
DatabaseError::new("machine::find_rms_identities_by_bmc_ips", ...) with a clear
message about duplicate BMC IPs, otherwise return the vector as before.

---

Nitpick comments:
In `@crates/component-manager/src/component_manager.rs`:
- Around line 147-164: Add unit tests for the compute_tray RMS branch that
exercise the match arm creating compute_tray: write three tests targeting
ComponentManager::new (or the constructor that contains the snippet) with
config.compute_tray_backend = Backend::Rms — (1) build the inputs with
rms_client = None and a valid db to assert it returns
Err(ComponentManagerError::InvalidArgument) and the message mentions "RMS client
is not configured", (2) rms_client = Some(...) but db = None to assert
Err(ComponentManagerError::InvalidArgument) with message mentioning "database
pool is not configured", and (3) provide both rms_client and db to assert
success (Ok) and that compute_tray is an Arc containing a RmsBackend (or that
construction succeeds). Use the same types referenced in the code (Backend::Rms,
ComponentManagerError::InvalidArgument, crate::rms::RmsBackend::new,
compute_tray) and keep tests isolated/mocked as needed.

In `@crates/component-manager/src/test_support.rs`:
- Around line 43-52: Duplicate hash-padding logic appears in test_machine_id,
test_power_shelf_id, and test_switch_id; extract it into a small helper e.g.
label_to_hash(label: &str) -> [u8; 32] that creates the zeroed 32-byte array,
copies up to 32 bytes of label.as_bytes(), and returns the array, then update
test_machine_id, test_power_shelf_id and test_switch_id to call this helper
instead of repeating the same code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: adb3a3c8-47e6-40ed-b402-aa03ed62f76f

📥 Commits

Reviewing files that changed from the base of the PR and between a7803f8 and 8a81e86.

📒 Files selected for processing (4)
  • crates/api-db/src/machine.rs
  • crates/component-manager/src/component_manager.rs
  • crates/component-manager/src/rms.rs
  • crates/component-manager/src/test_support.rs

Comment thread crates/api-db/src/machine.rs
…kend

Enable `compute_tray_backend = "rms"` so compute tray power and firmware
operations use the same direct RMS path as switches and power shelves,
instead of returning a hard error from the component manager factory.

Add `MachineRmsIdentity` and `find_rms_identities_by_bmc_ips()` to resolve
machine node_id, rack_id, and BMC MAC from BMC IP via machine_interfaces.
Implement `ComputeTrayManager for RmsBackend` with BatchSetPowerState,
ApplyFirmwareObject (NodeType::Compute), firmware job tracking, and bundle
listing. Disambiguate existing RMS tests that called trait methods directly
on `RmsBackend` now that multiple managers share the same type.

Signed-off-by: Vinod Chitrali <vchitrali@nvidia.com>
@vinodchitraliNVIDIA

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/component-manager/src/component_manager.rs (1)

189-269: ⚡ Quick win

Consider adding factory validation tests for RMS compute-tray backend.

The test suite does not exercise the new validation logic introduced for Backend::Rms (lines 149-158). Adding tests for missing rms_client, missing db, and successful RMS backend construction would improve coverage and document expected behavior. The same gap exists for nv_switch and power_shelf RMS validation—recommend covering all three for consistency.

♻️ Example test structure
#[tokio::test]
async fn build_rms_compute_tray_without_client_returns_error() {
    let config = ComponentManagerConfig {
        nv_switch_backend: "mock".into(),
        power_shelf_backend: "mock".into(),
        compute_tray_backend: Backend::Rms,
        ..Default::default()
    };
    // Pass db but no rms_client
    let err = build_component_manager(&config, None, None, Some(mock_pool()), None)
        .await
        .unwrap_err();
    assert!(matches!(err, ComponentManagerError::InvalidArgument(msg) 
        if msg.contains("RMS client is not configured")));
}

#[tokio::test]
async fn build_rms_compute_tray_without_db_returns_error() {
    let config = ComponentManagerConfig {
        nv_switch_backend: "mock".into(),
        power_shelf_backend: "mock".into(),
        compute_tray_backend: Backend::Rms,
        ..Default::default()
    };
    // Pass rms_client but no db
    let err = build_component_manager(&config, Some(mock_rms_client()), None, None, None)
        .await
        .unwrap_err();
    assert!(matches!(err, ComponentManagerError::InvalidArgument(msg) 
        if msg.contains("database pool is not configured")));
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/component-manager/src/component_manager.rs` around lines 189 - 269,
The tests don't cover the new RMS factory validation for Backend::Rms in
build_component_manager; add unit tests that exercise compute_tray (and
similarly nv_switch and power_shelf) when Backend::Rms is selected: one test
passing a db but no rms_client to assert build_component_manager returns
ComponentManagerError::InvalidArgument with a message about the missing RMS
client, one test passing an rms_client but no db to assert an InvalidArgument
about the missing database pool, and one test providing both mock_rms_client and
mock_db to assert successful construction and correct component names for
compute_tray (and mirror these three cases for nv_switch and power_shelf as
appropriate). Ensure tests construct ComponentManagerConfig with
compute_tray_backend: Backend::Rms and call
build_component_manager(...).await.unwrap()/unwrap_err() as needed and match
error messages.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/component-manager/src/component_manager.rs`:
- Around line 189-269: The tests don't cover the new RMS factory validation for
Backend::Rms in build_component_manager; add unit tests that exercise
compute_tray (and similarly nv_switch and power_shelf) when Backend::Rms is
selected: one test passing a db but no rms_client to assert
build_component_manager returns ComponentManagerError::InvalidArgument with a
message about the missing RMS client, one test passing an rms_client but no db
to assert an InvalidArgument about the missing database pool, and one test
providing both mock_rms_client and mock_db to assert successful construction and
correct component names for compute_tray (and mirror these three cases for
nv_switch and power_shelf as appropriate). Ensure tests construct
ComponentManagerConfig with compute_tray_backend: Backend::Rms and call
build_component_manager(...).await.unwrap()/unwrap_err() as needed and match
error messages.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: aab7f3e8-e559-4abe-8344-808d469394d2

📥 Commits

Reviewing files that changed from the base of the PR and between 8a81e86 and b960196.

📒 Files selected for processing (4)
  • crates/api-db/src/machine.rs
  • crates/component-manager/src/component_manager.rs
  • crates/component-manager/src/rms.rs
  • crates/component-manager/src/test_support.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • crates/api-db/src/machine.rs
  • crates/component-manager/src/test_support.rs
  • crates/component-manager/src/rms.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant