feat: route tensor discovery through AcceleratorBackend predicate by yafshar · Pull Request #454 · ai-dynamo/modelexpress

yafshar · 2026-06-25T22:15:42Z

Summary

Makes tensor discovery backend-aware so future non-CUDA workers can correctly discover publishable tensors instead of emitting empty sets. Builds on the AcceleratorBackend boundary (#438). Behavior remains unchanged for CUDA.

Changes

Replace hardcoded tensor.is_cuda checks with AcceleratorBackend.is_accel_tensor().
Thread the active backend from engine adapters into tensor_utils helpers (capture_tensor_attrs, adopt_hidden_tensors,
iter_module_tensors, collect_module_tensors) and the SGLang collect_sglang_tensors path.
Ensure both vLLM capture_tensor_attrs call sites (post-load and model-specific finalizers) pass self.accelerator_backend.
Rename _find_hidden_cuda_tensors → _find_hidden_accel_tensors; update docstrings and logs to use backend-agnostic terminology.
Default to the CUDA backend when none is provided, preserving existing runtime behavior.

Testing

Discovery helpers now execute under a CPU-mock backend in CI (hidden-tensor adoption and collection no longer skip in GPU-less environments).
Existing no-backend tests confirm default CUDA behavior is unchanged (CPU tensors are still ignored).
Python test suite:
- Local: python -m pytest modelexpress_client/python/tests -v — 405 passed
- GPU: python -m pytest modelexpress_client/python/tests -v — 426 passed, 1 skipped
pre-commit run --all-files passes cleanly (cargo fmt/clippy/check).

Summary by CodeRabbit

New Features
- Tensor discovery and collection are now accelerator-backend aware, including filtering by the active backend.
- Hidden tensors can be detected and registered across broader module/object graphs for supported backends.
Bug Fixes
- Improved accuracy by excluding non-accelerator-resident tensors from results.
- Enhanced handling of non-contiguous storage views and tied-weight deduplication during registration.
Documentation
- Updated architecture documentation to use accelerator-generic terminology.
Tests
- Added and updated coverage for backend-aware tensor discovery and hidden-tensor adoption.

Replace hardcoded CUDA tensor checks with `AcceleratorBackend.is_accel_tensor()` so non-CUDA workers can discover their publishable tensors instead of emitting empty sets. Propagate the backend from engine adapters into tensor_utils free functions (`capture_tensor_attrs`, `adopt_hidden_tensors`, `iter_module_tensors`, `collect_module_tensors`) and the SGLang `collect_sglang_tensors` path. Defaults continue to use the CUDA backend, preserving existing runtime behavior. Rename `_find_hidden_cuda_tensors` to `_find_hidden_accel_tensors` and update CUDA-specific docstrings and logs to reflect backend-agnostic accelerator tensors. Update tests to inject a mock backend (`torch_device_type="cpu"`) so hidden tensor adoption and collection logic can run in CPU CI (instead of skipping without CUDA). Retain no-backend tests to ensure the default CUDA backend still ignores CPU tensors. Signed-off-by: Yaser Afshar <yaser.afshar@intel.com>

copy-pr-bot · 2026-06-25T22:15:45Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-06-25T22:22:06Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 985b7f76-ca0a-4992-bdd6-dd932ff2ccdf

📥 Commits

Reviewing files that changed from the base of the PR and between 6c57ae6 and 3264066.

📒 Files selected for processing (2)

docs/ARCHITECTURE.md
modelexpress_client/python/tests/test_tensor_utils.py

✅ Files skipped from review due to trivial changes (1)

docs/ARCHITECTURE.md

🚧 Files skipped from review as they are similar to previous changes (1)

modelexpress_client/python/tests/test_tensor_utils.py

Walkthrough

Tensor discovery now routes through accelerator-backend predicates instead of CUDA-only checks. The tensor utility helpers, Sglang and vLLM adapters, tests, and architecture notes were updated to pass backend instances through capture, adoption, and collection paths.

Changes

Accelerator backend tensor discovery

Layer / File(s)	Summary
Backend capture and hidden adoption `modelexpress_client/python/modelexpress/tensor_utils.py`, `docs/ARCHITECTURE.md`, `modelexpress_client/python/tests/test_tensor_utils.py`	`capture_tensor_attrs`, hidden-tensor scanning, and hidden-tensor adoption now use `AcceleratorBackend` predicates, and the architecture text and tests were updated to match the backend-aware tensor behavior.
Module tensor collection `modelexpress_client/python/modelexpress/tensor_utils.py`, `modelexpress_client/python/tests/test_tensor_utils.py`	`iter_module_tensors` and `collect_module_tensors` now resolve an optional accelerator backend and filter tensors with backend-specific detection, and the tests cover collection, tied-weight deduplication, and storage views.
Sglang backend-aware discovery `modelexpress_client/python/modelexpress/engines/sglang/adapter.py`, `modelexpress_client/python/tests/test_sglang_loader.py`	`SglangAdapter.discover_tensors` now forwards `self.accelerator_backend`, `collect_sglang_tensors` defaults to `CudaAcceleratorBackend()` while skipping non-accelerator parameters, and the tests pass explicit backends for discovery and collection.
vLLM backend propagation `modelexpress_client/python/modelexpress/engines/vllm/adapter.py`, `modelexpress_client/python/tests/test_vllm_adapter.py`	`VllmAdapter` now uses `self.accelerator_backend` for hidden-tensor adoption and tensor-attribute capture, and the new test stubs backend selection during discovery.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

A rabbit hopped through tensor trails,
With backend ears and twitchy tails.
No CUDA-only crumbs in sight—
Just accelerator paths, shiny and bright.
Hooray, the burrow’s code is tight! 🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 21.95% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: tensor discovery now uses the AcceleratorBackend predicate instead of CUDA-only checks.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/ARCHITECTURE.md (1)
623-633: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Finish the backend-aware doc rewrite in this section.

This paragraph is accelerator-neutral now, but the same file still describes FP8 recovery as finding CUDA tensors, and the table here still implies tensor attributes come from a dir(module) scan rather than buffer promotion/adoption. Please update those nearby descriptions together so the architecture doc matches the implementation end-to-end.

As per coding guidelines, docs/ARCHITECTURE.md: Update docs/ARCHITECTURE.md when making changes to architecture, components, NIXL, gRPC services, known issues, FP8 handling, or new binary targets and crates.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/ARCHITECTURE.md` around lines 623 - 633, The backend-aware documentation
in this section is partially updated, but it still mentions CUDA-specific tensor
recovery and describes tensor attributes as coming from a dir(module) scan.
Update the surrounding text in this architecture section to consistently use
accelerator-neutral wording, and align the tensor-attribute explanation with the
actual adoption/promotion flow used by iter_module_tensors() and
adopt_hidden_tensors() rather than implying a raw dir(module) scan. Make sure
the descriptions of FP8 recovery, buffer promotion, and orphan tensor
registration all match the current implementation end-to-end.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@docs/ARCHITECTURE.md`:
- Around line 623-633: The backend-aware documentation in this section is
partially updated, but it still mentions CUDA-specific tensor recovery and
describes tensor attributes as coming from a dir(module) scan. Update the
surrounding text in this architecture section to consistently use
accelerator-neutral wording, and align the tensor-attribute explanation with the
actual adoption/promotion flow used by iter_module_tensors() and
adopt_hidden_tensors() rather than implying a raw dir(module) scan. Make sure
the descriptions of FP8 recovery, buffer promotion, and orphan tensor
registration all match the current implementation end-to-end.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e9338628-5b57-44fa-a710-2809efd5081f

📥 Commits

Reviewing files that changed from the base of the PR and between 76ea4ce and 6c57ae6.

📒 Files selected for processing (7)

docs/ARCHITECTURE.md
modelexpress_client/python/modelexpress/engines/sglang/adapter.py
modelexpress_client/python/modelexpress/engines/vllm/adapter.py
modelexpress_client/python/modelexpress/tensor_utils.py
modelexpress_client/python/tests/test_sglang_loader.py
modelexpress_client/python/tests/test_tensor_utils.py
modelexpress_client/python/tests/test_vllm_adapter.py

Signed-off-by: Yaser Afshar <yaser.afshar@intel.com>

yafshar · 2026-06-25T23:56:54Z

@coderabbitai

coderabbitai · 2026-06-25T23:57:07Z

@yafshar: I’ll review the changes now.

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

…very

pull-request-size Bot added the size/L label Jun 25, 2026

github-actions Bot added the feat label Jun 25, 2026

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread modelexpress_client/python/tests/test_tensor_utils.py Outdated

docs: align tensor discovery docs and collision test

3264066

Signed-off-by: Yaser Afshar <yaser.afshar@intel.com>

Merge branch 'ai-dynamo:main' into yafshar/backend-aware-tensor-disco…

daf920b

…very

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: route tensor discovery through AcceleratorBackend predicate#454

feat: route tensor discovery through AcceleratorBackend predicate#454
yafshar wants to merge 3 commits into
ai-dynamo:mainfrom
yafshar:yafshar/backend-aware-tensor-discovery

yafshar commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

yafshar commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

yafshar commented Jun 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yafshar commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yafshar commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading