APITest 测试修复:🐛 修复 non-contiguous getitem 数值检查误报#654
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🧭 背景
as_strided模拟真实 shape/stride 布局,用于覆盖 strided tensor 场景FLAGS_check_nan_inf=true后,会在paddle.Tensor.__getitem__的 slice 检查阶段报 NaN🔎 问题定位
[2, 1048576, 2048],strides 为[2147485696, 2048, 1]1048576 * 2048 = 2147483648,而第 0 维 stride 是2147485696,两个 batch 之间存在2048个元素的 storage gappaddle.empty分配底层 storage,再通过 strided view 写入逻辑 tensor 数据;该写入只覆盖逻辑 view 对应的位置,不会初始化 storage gapFLAGS_check_nan_inf检查时,检查逻辑可能连续扫描到 gap 区域,若 gap 中残留 NaN,就会出现num_nan=2048的误报🔧 主要变更
1. 修复 Paddle strided tensor 构造的 gap 残留 NaN
_strided_storage_size()计算底层 storage 大小paddle.zeros初始化整个底层 storage,不再根据FLAGS_check_nan_inf区分初始化方式2. 保持 Torch 行为不变
📁 改动文件
✅ 验证
测试验证:
shape=[2, 1048576, 2048]、strides=[2147485696, 2048, 1]、x[:, 1:, :]在开启FLAGS_check_nan_inf=true后 slice 成功,输出 shape 为[2, 1048575, 2048]