Skip to content

feat: route tensor discovery through AcceleratorBackend predicate#454

Open
yafshar wants to merge 3 commits into
ai-dynamo:mainfrom
yafshar:yafshar/backend-aware-tensor-discovery
Open

feat: route tensor discovery through AcceleratorBackend predicate#454
yafshar wants to merge 3 commits into
ai-dynamo:mainfrom
yafshar:yafshar/backend-aware-tensor-discovery

Conversation

@yafshar

@yafshar yafshar commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Makes tensor discovery backend-aware so future non-CUDA workers can correctly discover publishable tensors instead of emitting empty sets. Builds on the AcceleratorBackend boundary (#438). Behavior remains unchanged for CUDA.

Changes

  • Replace hardcoded tensor.is_cuda checks with AcceleratorBackend.is_accel_tensor().
  • Thread the active backend from engine adapters into tensor_utils helpers (capture_tensor_attrs, adopt_hidden_tensors,
    iter_module_tensors, collect_module_tensors) and the SGLang collect_sglang_tensors path.
  • Ensure both vLLM capture_tensor_attrs call sites (post-load and model-specific finalizers) pass self.accelerator_backend.
  • Rename _find_hidden_cuda_tensors_find_hidden_accel_tensors; update docstrings and logs to use backend-agnostic terminology.
  • Default to the CUDA backend when none is provided, preserving existing runtime behavior.

Testing

  • Discovery helpers now execute under a CPU-mock backend in CI (hidden-tensor adoption and collection no longer skip in GPU-less environments).
  • Existing no-backend tests confirm default CUDA behavior is unchanged (CPU tensors are still ignored).
  • Python test suite:
    • Local: python -m pytest modelexpress_client/python/tests -v — 405 passed
    • GPU: python -m pytest modelexpress_client/python/tests -v — 426 passed, 1 skipped
  • pre-commit run --all-files passes cleanly (cargo fmt/clippy/check).

Summary by CodeRabbit

  • New Features
    • Tensor discovery and collection are now accelerator-backend aware, including filtering by the active backend.
    • Hidden tensors can be detected and registered across broader module/object graphs for supported backends.
  • Bug Fixes
    • Improved accuracy by excluding non-accelerator-resident tensors from results.
    • Enhanced handling of non-contiguous storage views and tied-weight deduplication during registration.
  • Documentation
    • Updated architecture documentation to use accelerator-generic terminology.
  • Tests
    • Added and updated coverage for backend-aware tensor discovery and hidden-tensor adoption.

Replace hardcoded CUDA tensor checks with
`AcceleratorBackend.is_accel_tensor()` so non-CUDA workers can discover
their publishable tensors instead of emitting empty sets.

Propagate the backend from engine adapters into tensor_utils free
functions (`capture_tensor_attrs`, `adopt_hidden_tensors`,
`iter_module_tensors`, `collect_module_tensors`) and the SGLang
`collect_sglang_tensors` path. Defaults continue to use the CUDA backend,
preserving existing runtime behavior.

Rename `_find_hidden_cuda_tensors` to `_find_hidden_accel_tensors` and
update CUDA-specific docstrings and logs to reflect backend-agnostic
accelerator tensors.

Update tests to inject a mock backend (`torch_device_type="cpu"`) so
hidden tensor adoption and collection logic can run in CPU CI (instead of
skipping without CUDA). Retain no-backend tests to ensure the default
CUDA backend still ignores CPU tensors.

Signed-off-by: Yaser Afshar <yaser.afshar@intel.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 985b7f76-ca0a-4992-bdd6-dd932ff2ccdf

📥 Commits

Reviewing files that changed from the base of the PR and between 6c57ae6 and 3264066.

📒 Files selected for processing (2)
  • docs/ARCHITECTURE.md
  • modelexpress_client/python/tests/test_tensor_utils.py
✅ Files skipped from review due to trivial changes (1)
  • docs/ARCHITECTURE.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • modelexpress_client/python/tests/test_tensor_utils.py

Walkthrough

Tensor discovery now routes through accelerator-backend predicates instead of CUDA-only checks. The tensor utility helpers, Sglang and vLLM adapters, tests, and architecture notes were updated to pass backend instances through capture, adoption, and collection paths.

Changes

Accelerator backend tensor discovery

Layer / File(s) Summary
Backend capture and hidden adoption
modelexpress_client/python/modelexpress/tensor_utils.py, docs/ARCHITECTURE.md, modelexpress_client/python/tests/test_tensor_utils.py
capture_tensor_attrs, hidden-tensor scanning, and hidden-tensor adoption now use AcceleratorBackend predicates, and the architecture text and tests were updated to match the backend-aware tensor behavior.
Module tensor collection
modelexpress_client/python/modelexpress/tensor_utils.py, modelexpress_client/python/tests/test_tensor_utils.py
iter_module_tensors and collect_module_tensors now resolve an optional accelerator backend and filter tensors with backend-specific detection, and the tests cover collection, tied-weight deduplication, and storage views.
Sglang backend-aware discovery
modelexpress_client/python/modelexpress/engines/sglang/adapter.py, modelexpress_client/python/tests/test_sglang_loader.py
SglangAdapter.discover_tensors now forwards self.accelerator_backend, collect_sglang_tensors defaults to CudaAcceleratorBackend() while skipping non-accelerator parameters, and the tests pass explicit backends for discovery and collection.
vLLM backend propagation
modelexpress_client/python/modelexpress/engines/vllm/adapter.py, modelexpress_client/python/tests/test_vllm_adapter.py
VllmAdapter now uses self.accelerator_backend for hidden-tensor adoption and tensor-attribute capture, and the new test stubs backend selection during discovery.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

A rabbit hopped through tensor trails,
With backend ears and twitchy tails.
No CUDA-only crumbs in sight—
Just accelerator paths, shiny and bright.
Hooray, the burrow’s code is tight! 🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.95% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: tensor discovery now uses the AcceleratorBackend predicate instead of CUDA-only checks.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/ARCHITECTURE.md (1)

623-633: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Finish the backend-aware doc rewrite in this section.

This paragraph is accelerator-neutral now, but the same file still describes FP8 recovery as finding CUDA tensors, and the table here still implies tensor attributes come from a dir(module) scan rather than buffer promotion/adoption. Please update those nearby descriptions together so the architecture doc matches the implementation end-to-end.

As per coding guidelines, docs/ARCHITECTURE.md: Update docs/ARCHITECTURE.md when making changes to architecture, components, NIXL, gRPC services, known issues, FP8 handling, or new binary targets and crates.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/ARCHITECTURE.md` around lines 623 - 633, The backend-aware documentation
in this section is partially updated, but it still mentions CUDA-specific tensor
recovery and describes tensor attributes as coming from a dir(module) scan.
Update the surrounding text in this architecture section to consistently use
accelerator-neutral wording, and align the tensor-attribute explanation with the
actual adoption/promotion flow used by iter_module_tensors() and
adopt_hidden_tensors() rather than implying a raw dir(module) scan. Make sure
the descriptions of FP8 recovery, buffer promotion, and orphan tensor
registration all match the current implementation end-to-end.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@docs/ARCHITECTURE.md`:
- Around line 623-633: The backend-aware documentation in this section is
partially updated, but it still mentions CUDA-specific tensor recovery and
describes tensor attributes as coming from a dir(module) scan. Update the
surrounding text in this architecture section to consistently use
accelerator-neutral wording, and align the tensor-attribute explanation with the
actual adoption/promotion flow used by iter_module_tensors() and
adopt_hidden_tensors() rather than implying a raw dir(module) scan. Make sure
the descriptions of FP8 recovery, buffer promotion, and orphan tensor
registration all match the current implementation end-to-end.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e9338628-5b57-44fa-a710-2809efd5081f

📥 Commits

Reviewing files that changed from the base of the PR and between 76ea4ce and 6c57ae6.

📒 Files selected for processing (7)
  • docs/ARCHITECTURE.md
  • modelexpress_client/python/modelexpress/engines/sglang/adapter.py
  • modelexpress_client/python/modelexpress/engines/vllm/adapter.py
  • modelexpress_client/python/modelexpress/tensor_utils.py
  • modelexpress_client/python/tests/test_sglang_loader.py
  • modelexpress_client/python/tests/test_tensor_utils.py
  • modelexpress_client/python/tests/test_vllm_adapter.py

Comment thread modelexpress_client/python/tests/test_tensor_utils.py Outdated
Signed-off-by: Yaser Afshar <yaser.afshar@intel.com>
@yafshar

yafshar commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@yafshar: I’ll review the changes now.

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant