Skip to content

APITest 测试修复:🐛 修复 non-contiguous getitem 数值检查误报#654

Merged
cangtianhuang merged 2 commits into
PFCCLab:mainfrom
cangtianhuang:fix/getitem
Jun 21, 2026
Merged

APITest 测试修复:🐛 修复 non-contiguous getitem 数值检查误报#654
cangtianhuang merged 2 commits into
PFCCLab:mainfrom
cangtianhuang:fix/getitem

Conversation

@cangtianhuang

@cangtianhuang cangtianhuang commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

🧭 背景

  • PaddleAPITest 在构造 non-contiguous tensor 时,会通过 as_strided 模拟真实 shape/stride 布局,用于覆盖 strided tensor 场景
  • 当外部开启 FLAGS_check_nan_inf=true 后,会在 paddle.Tensor.__getitem__ 的 slice 检查阶段报 NaN

🔎 问题定位

  • 复现 case 中输入 shape 为 [2, 1048576, 2048],strides 为 [2147485696, 2048, 1]
  • 单个 batch 的连续大小是 1048576 * 2048 = 2147483648,而第 0 维 stride 是 2147485696,两个 batch 之间存在 2048 个元素的 storage gap
  • 测试侧旧逻辑使用 paddle.empty 分配底层 storage,再通过 strided view 写入逻辑 tensor 数据;该写入只覆盖逻辑 view 对应的位置,不会初始化 storage gap
  • 当 slice 输出触发 FLAGS_check_nan_inf 检查时,检查逻辑可能连续扫描到 gap 区域,若 gap 中残留 NaN,就会出现 num_nan=2048 的误报

🔧 主要变更

1. 修复 Paddle strided tensor 构造的 gap 残留 NaN

  • 在 non-contiguous Paddle tensor 构造阶段复用 _strided_storage_size() 计算底层 storage 大小
  • 统一使用 paddle.zeros 初始化整个底层 storage,不再根据 FLAGS_check_nan_inf 区分初始化方式
  • 避免后续 getitem/slice 数值检查扫描到底层 gap 中的未初始化 NaN

2. 保持 Torch 行为不变

  • Torch strided tensor 构造逻辑保持不变

📁 改动文件

tester/
└── api_config/
    └── config_analyzer.py  # 调整 non-contiguous Paddle tensor 底层 storage 初始化策略

✅ 验证

测试验证:shape=[2, 1048576, 2048]strides=[2147485696, 2048, 1]x[:, 1:, :] 在开启 FLAGS_check_nan_inf=true 后 slice 成功,输出 shape 为 [2, 1048575, 2048]

@cangtianhuang cangtianhuang enabled auto-merge (squash) June 21, 2026 13:47
@cangtianhuang cangtianhuang merged commit 281f5ee into PFCCLab:main Jun 21, 2026
1 check passed
@cangtianhuang cangtianhuang deleted the fix/getitem branch June 21, 2026 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant