LogWriter 可读性优化:优化日志分类与打印体系#651
Open
cangtianhuang wants to merge 1 commit into
Open
Conversation
rename log type, remake classification method, update tools
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 主要改进
1. 日志分类体系完全重设计
将原有混乱的分类体系梳理为四个明确层级:
paddle_errorapi_config_paddle_error.txtpaddle_accuracyapi_config_paddle_accuracy.txtaccuracy_error)paddle_bitwiseapi_config_paddle_bitwise.txtaccuracy_diff)paddle_cudaapi_config_paddle_cuda.txtcuda_error)paddle_crashapi_config_paddle_crash.txtcrash)oomapi_config_oom.txttimeoutapi_config_timeout.txttorch_errorapi_config_torch_error.txtconfig_inputapi_config_config_input.txtnumpy_error)config_parseapi_config_config_parse.txtmatch_error)config_convertapi_config_config_convert.txtpaddle_to_torch_failed)passapi_config_pass.txtskipapi_config_skip.txt2. 打印前缀标准化
[category] ...格式,如[pass]、[paddle_accuracy]、[paddle_cuda]key=value格式,如mode=forward、phase=backward、idx=0、comp=T1P13. 大张量内存管理优化
99→ Paddle CUDA fatal98→ OOM97→ Torch 对照侧 fatalshow_runtime_status选项,可选输出测试前后的显存占用4. Checkpoint 与终态管理机制
write_terminal_log()/has_terminal_log()/write_checkpoint()函数pass终态后,后续pass日志不再覆盖(防止误判)5. 统计信息细分
日志汇总时新增三个维度统计:
📝 具体改动
核心文件修改:
engineV2.py:内存等待、退出码、checkpoint 管理tester/accuracy.py、tester/accuracy_stable.py、tester/paddle_only.py:统一日志分类、错误处理逻辑tester/base.py:新增report_runtime_error()方法、集中化错误分类函数tester/api_config/log_writer.py:分类映射表、别名支持、终态管理tools/下的统计工具:日志前缀同步更新向后兼容:通过
LOG_ALIASES字典自动映射旧分类名,降低切换风险