Add frequency measurement to the stopping criterion#372
Conversation
|
/ok to test 023e330 |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughSummary by CodeRabbit
WalkthroughStopping criteria now receive GPU clock frequency measurements. The ChangesFrequency Tracking in Stopping Criteria
Warning Review ran into problems🔥 ProblemsStopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a Comment |
|
|
||
| void measure_cold_base::record_measurements() | ||
| { | ||
| const auto current_clock_rate = m_gpu_frequency.get_clock_frequency(); |
There was a problem hiding this comment.
For the m_gpu_frequency.get_clock_frequency() return value to be meaningful, calls to m_gpu_frequency.start() and m_gpu_frequency.stop() must be made to populate internal time-point data members. However, this is only done when m_check_throttling is true. (See nvbench/detail/measure_cold_launch_timer_core.cuh:119 and nvbench/detail/measure_cold_launch_timer_core.cuh:146).
For instance, when --profile is used, m_check_throttling value would be false, the frequency timestamp recording would be skipped and the return value of get_clock_frequency() would be based on the content of uninitialized memory.
This unspecified value would be passed to the stopping criterion object.
There was a problem hiding this comment.
We could modify measure_cold_launch_timer_core.cuh so that the start and stop methods be called unconditionally.
| m_stopping_criterion.add_frequency(current_clock_rate); | ||
| m_stopping_criterion.add_measurement(cur_cuda_time); |
There was a problem hiding this comment.
stopping_criterion_base::add_frequency() is added as a separate callback.
The interface does not document the ordering/pairing contract.
The cold measurement calls add_frequency() before add_measurement() only for accepted samples, and CPU-only measurement does not call it at all.
Ideally, all stopping criterion classes should work with all measurement classes.
This MR makes every iteration's measured average frequency available to the stopping criterion, in case it wants to use it to make decisions. The virtual function is implemented as a no-op so that classes that do not have it implemented do not break. If you think it would be a better idea to implement it some other way, please let me know!
On another note: I have noticed that you are using FP32 for the frequency, which is always going to be an integer since it is in Hz, and for values in the GHz range, FP32 has a granularity of around 128. I don't think this is especially problematic given that when the frequency is in the order of GHz, a discrepancy of 128 Hz is insignificant, but wanted to bring it to your attention just in case.