feat: route tensor discovery through AcceleratorBackend predicate#454
feat: route tensor discovery through AcceleratorBackend predicate#454yafshar wants to merge 3 commits into
Conversation
Replace hardcoded CUDA tensor checks with `AcceleratorBackend.is_accel_tensor()` so non-CUDA workers can discover their publishable tensors instead of emitting empty sets. Propagate the backend from engine adapters into tensor_utils free functions (`capture_tensor_attrs`, `adopt_hidden_tensors`, `iter_module_tensors`, `collect_module_tensors`) and the SGLang `collect_sglang_tensors` path. Defaults continue to use the CUDA backend, preserving existing runtime behavior. Rename `_find_hidden_cuda_tensors` to `_find_hidden_accel_tensors` and update CUDA-specific docstrings and logs to reflect backend-agnostic accelerator tensors. Update tests to inject a mock backend (`torch_device_type="cpu"`) so hidden tensor adoption and collection logic can run in CPU CI (instead of skipping without CUDA). Retain no-backend tests to ensure the default CUDA backend still ignores CPU tensors. Signed-off-by: Yaser Afshar <yaser.afshar@intel.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughTensor discovery now routes through accelerator-backend predicates instead of CUDA-only checks. The tensor utility helpers, Sglang and vLLM adapters, tests, and architecture notes were updated to pass backend instances through capture, adoption, and collection paths. ChangesAccelerator backend tensor discovery
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/ARCHITECTURE.md (1)
623-633: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winFinish the backend-aware doc rewrite in this section.
This paragraph is accelerator-neutral now, but the same file still describes FP8 recovery as finding
CUDA tensors, and the table here still implies tensor attributes come from adir(module)scan rather than buffer promotion/adoption. Please update those nearby descriptions together so the architecture doc matches the implementation end-to-end.As per coding guidelines,
docs/ARCHITECTURE.md: Updatedocs/ARCHITECTURE.mdwhen making changes to architecture, components, NIXL, gRPC services, known issues, FP8 handling, or new binary targets and crates.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/ARCHITECTURE.md` around lines 623 - 633, The backend-aware documentation in this section is partially updated, but it still mentions CUDA-specific tensor recovery and describes tensor attributes as coming from a dir(module) scan. Update the surrounding text in this architecture section to consistently use accelerator-neutral wording, and align the tensor-attribute explanation with the actual adoption/promotion flow used by iter_module_tensors() and adopt_hidden_tensors() rather than implying a raw dir(module) scan. Make sure the descriptions of FP8 recovery, buffer promotion, and orphan tensor registration all match the current implementation end-to-end.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@docs/ARCHITECTURE.md`:
- Around line 623-633: The backend-aware documentation in this section is
partially updated, but it still mentions CUDA-specific tensor recovery and
describes tensor attributes as coming from a dir(module) scan. Update the
surrounding text in this architecture section to consistently use
accelerator-neutral wording, and align the tensor-attribute explanation with the
actual adoption/promotion flow used by iter_module_tensors() and
adopt_hidden_tensors() rather than implying a raw dir(module) scan. Make sure
the descriptions of FP8 recovery, buffer promotion, and orphan tensor
registration all match the current implementation end-to-end.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e9338628-5b57-44fa-a710-2809efd5081f
📒 Files selected for processing (7)
docs/ARCHITECTURE.mdmodelexpress_client/python/modelexpress/engines/sglang/adapter.pymodelexpress_client/python/modelexpress/engines/vllm/adapter.pymodelexpress_client/python/modelexpress/tensor_utils.pymodelexpress_client/python/tests/test_sglang_loader.pymodelexpress_client/python/tests/test_tensor_utils.pymodelexpress_client/python/tests/test_vllm_adapter.py
Signed-off-by: Yaser Afshar <yaser.afshar@intel.com>
|
✅ Action performedReview finished.
|
Summary
Makes tensor discovery backend-aware so future non-CUDA workers can correctly discover publishable tensors instead of emitting empty sets. Builds on the AcceleratorBackend boundary (#438). Behavior remains unchanged for CUDA.
Changes
tensor.is_cudachecks withAcceleratorBackend.is_accel_tensor().tensor_utilshelpers (capture_tensor_attrs,adopt_hidden_tensors,iter_module_tensors,collect_module_tensors) and the SGLangcollect_sglang_tensorspath.capture_tensor_attrscall sites (post-load and model-specific finalizers) passself.accelerator_backend._find_hidden_cuda_tensors→_find_hidden_accel_tensors; update docstrings and logs to use backend-agnostic terminology.Testing
python -m pytest modelexpress_client/python/tests -v— 405 passedpython -m pytest modelexpress_client/python/tests -v— 426 passed, 1 skippedpre-commit run --all-filespasses cleanly (cargo fmt/clippy/check).Summary by CodeRabbit