Implement ROCm delay kernel#882
Conversation
Review SummarySolid implementation that mirrors the CUDA delay kernel for ROCm. The architecture gating (MI300+) and HIP serialization env-var checks are well thought out. The Key findings (see inline comments):
|
46f30de to
a6995a8
Compare
| #include "rocm/include/hip/hip_runtime.h" | ||
| #include "rocm/include/hip/hip_version.h" | ||
| #include "rocm/rocm_config.h" | ||
| #include <unistd.h> |
There was a problem hiding this comment.
nit: #include <unistd.h> was moved here from the C system headers section and is now sandwiched between rocm/rocm_config.h and the xla/ headers. Per Google style, C system headers should appear in the system headers block (after the corresponding header and C++ standard library includes). It was previously in the right place (before <algorithm>); the fix for <cstdlib> seems to have displaced it.
| #include <unistd.h> | |
| #include "xla/stream_executor/activate_context.h" |
(Move <unistd.h> back up to the C system headers section near line 18, alongside the other <c...> includes.)
| if (target_not_reached) { | ||
| *semaphore = GpuSemaphoreState::kTimedOut; | ||
| } |
There was a problem hiding this comment.
nit: The CUDA version has a helpful comment here explaining why kTimedOut is written back on timeout:
// We are exiting due to the timeout. Signal this back to the host so that
// we can emit a warning, as it probably indicates suboptimal usage.
Adding a similar comment would improve readability and parity.
| LOG_FIRST_N(WARNING, 5) | ||
| << "Delay kernel timed out, measured time has sub-optimal accuracy."; |
There was a problem hiding this comment.
nit: The CUDA version logs at ERROR level with actionable guidance ("There may be a missing warmup execution, please investigate in Nsight Systems."). Consider upgrading the severity and adding ROCm-equivalent guidance, e.g.:
| LOG_FIRST_N(WARNING, 5) | |
| << "Delay kernel timed out, measured time has sub-optimal accuracy."; | |
| LOG_FIRST_N(ERROR, 1) << "Delay kernel timed out, measured time has " | |
| "sub-optimal accuracy. There may be a missing " | |
| "warmup execution, please investigate in rocprof."; |
Re-review SummaryGood progress — 4 of 6 previous findings have been addressed (copyright year, capability caching, timeout comment, test skip guard, New findings (3 inline comments):
All new findings are minor nits/parity suggestions — no correctness issues found. |
No description provided.