Skip to content

fix: resolve index out of range panic in fetchContainerInfo for Ascend devices#95

Open
cygnushan wants to merge 1 commit into
Project-HAMi:mainfrom
cygnushan:fix-ascend-fetchcontainerinfo-panic
Open

fix: resolve index out of range panic in fetchContainerInfo for Ascend devices#95
cygnushan wants to merge 1 commit into
Project-HAMi:mainfrom
cygnushan:fix-ascend-fetchcontainerinfo-panic

Conversation

@cygnushan

Copy link
Copy Markdown

Description

This PR fixes the runtime error: index out of range [1] with length 1 panic that occurs in fetchContainerInfo when scanning Pods using Ascend310P / AscendGPU devices.

Previous PR #35 fixed the same symptom for NVIDIA/Hygon/Metax paths, but the Ascend branch in DecodePodDevices was missed and still lacks the container-index boundary check.

Root Cause

  1. DecodePodDevices (Ascend path): strings.Split(str, OnePodMultiContainerSplitSymbol) may yield more segments than actual containers in the pod spec, causing PodSingleDevice (i.e. []ContainerDevices) to be longer than len(pod.Spec.Containers).
  2. fetchContainerInfo: The old code blindly copied decoded devices into a slice and then indexed it with the container loop variable, triggering the panic when lengths mismatch.

Changes

  • server/internal/data/pod.go: Pre-allocate bizContainerDevices based on len(pod.Spec.Containers) and merge per-container devices with index guard (i < numContainers). Also fixes device data loss for multi-device-type pods by appending instead of overwriting.
  • server/internal/provider/util/util.go: Add i >= len(pod.Spec.Containers) { break } guard and empty-segment handling (s == "") to the AscendGPUDevice / Ascend310PGPUDevice branch, aligning it with NVIDIA/Hygon/Metax behavior.

Related

@hami-robot

hami-robot Bot commented May 19, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cygnushan

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hami-robot

hami-robot Bot commented May 19, 2026

Copy link
Copy Markdown

Welcome @cygnushan! It looks like this is your first PR to Project-HAMi/HAMi-WebUI 🎉

@hami-robot hami-robot Bot added the size/S label May 19, 2026
@cygnushan cygnushan force-pushed the fix-ascend-fetchcontainerinfo-panic branch from 7a7afe4 to 5e78963 Compare May 19, 2026 08:51
@hami-robot hami-robot Bot added size/L and removed size/S labels May 19, 2026
…d devices

- Pre-allocate bizContainerDevices based on actual container count
  to prevent out-of-bounds access when annotation device count
  differs from pod container count
- Merge devices from all device types per container instead of
  overwriting, which previously caused device data loss for
  multi-device-type pods
- Add container index boundary check in DecodePodDevices Ascend
  branch to align with NVIDIA/Hygon/Metax device handling

Fixes Project-HAMi#94

Signed-off-by: handong <cygnushan@yunify.com>
@cygnushan cygnushan force-pushed the fix-ascend-fetchcontainerinfo-panic branch from 5e78963 to d1cd2c1 Compare May 19, 2026 08:54
@cygnushan

Copy link
Copy Markdown
Author

Hi @archlitchi ,
This PR has been open for about two weeks now. When you have a chance, could someone please take a look and provide feedback?
Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Ascend310P Pod 触发 fetchContainerInfo 数组越界 panic,已有 PR #35 未覆盖昇腾路径

2 participants