Skip to content

Add frequency measurement to the stopping criterion#372

Open
mfranzrebsal wants to merge 1 commit into
NVIDIA:mainfrom
mfranzrebsal:add-frequency-to-criterion
Open

Add frequency measurement to the stopping criterion#372
mfranzrebsal wants to merge 1 commit into
NVIDIA:mainfrom
mfranzrebsal:add-frequency-to-criterion

Conversation

@mfranzrebsal
Copy link
Copy Markdown
Contributor

This MR makes every iteration's measured average frequency available to the stopping criterion, in case it wants to use it to make decisions. The virtual function is implemented as a no-op so that classes that do not have it implemented do not break. If you think it would be a better idea to implement it some other way, please let me know!

On another note: I have noticed that you are using FP32 for the frequency, which is always going to be an integer since it is in Hz, and for values in the GHz range, FP32 has a granularity of around 128. I don't think this is especially problematic given that when the frequency is in the order of GHz, a discrepancy of 128 Hz is insignificant, but wanted to bring it to your attention just in case.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 20, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@jrhemstad jrhemstad requested a review from gevtushenko May 26, 2026 15:28
@oleksandr-pavlyk oleksandr-pavlyk self-requested a review May 26, 2026 16:04
@gevtushenko gevtushenko removed their request for review May 26, 2026 16:04
@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator

/ok to test 023e330

@oleksandr-pavlyk oleksandr-pavlyk added the type: enhancement New feature or request. label May 26, 2026
@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0e67c7b8-e771-4128-958b-ebfdfaab7a8d

📥 Commits

Reviewing files that changed from the base of the PR and between 4a33a61 and 023e330.

📒 Files selected for processing (2)
  • nvbench/detail/measure_cold.cu
  • nvbench/stopping_criterion.cuh

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • GPU SM clock frequency monitoring enhanced: now unconditionally captures frequency data for each measurement sample throughout benchmark execution.
    • Measurement framework extended to support per-sample GPU clock frequency tracking, improving throttling detection and measurement analysis capabilities.

Walkthrough

Stopping criteria now receive GPU clock frequency measurements. The stopping_criterion_base interface declares a new public add_frequency method forwarding to a protected virtual hook with a default no-op body. The measurement loop unconditionally captures SM clock rate and reports it per sample after timing and count updates.

Changes

Frequency Tracking in Stopping Criteria

Layer / File(s) Summary
Stopping criterion frequency interface
nvbench/stopping_criterion.cuh
stopping_criterion_base receives a public add_frequency(float32_t) method that delegates to a protected virtual do_add_frequency(float32_t) hook with a default no-op implementation.
Frequency collection in measurement loop
nvbench/detail/measure_cold.cu
record_measurements() now unconditionally captures the SM clock rate before throttling checks and passes each sample's frequency to the stopping criterion via add_frequency().

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Comment @coderabbitai help to get the list of available commands and usage tips.


void measure_cold_base::record_measurements()
{
const auto current_clock_rate = m_gpu_frequency.get_clock_frequency();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the m_gpu_frequency.get_clock_frequency() return value to be meaningful, calls to m_gpu_frequency.start() and m_gpu_frequency.stop() must be made to populate internal time-point data members. However, this is only done when m_check_throttling is true. (See nvbench/detail/measure_cold_launch_timer_core.cuh:119 and nvbench/detail/measure_cold_launch_timer_core.cuh:146).

For instance, when --profile is used, m_check_throttling value would be false, the frequency timestamp recording would be skipped and the return value of get_clock_frequency() would be based on the content of uninitialized memory.

This unspecified value would be passed to the stopping criterion object.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could modify measure_cold_launch_timer_core.cuh so that the start and stop methods be called unconditionally.

Comment on lines +168 to 169
m_stopping_criterion.add_frequency(current_clock_rate);
m_stopping_criterion.add_measurement(cur_cuda_time);
Copy link
Copy Markdown
Collaborator

@oleksandr-pavlyk oleksandr-pavlyk May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stopping_criterion_base::add_frequency() is added as a separate callback.

The interface does not document the ordering/pairing contract.

The cold measurement calls add_frequency() before add_measurement() only for accepted samples, and CPU-only measurement does not call it at all.

Ideally, all stopping criterion classes should work with all measurement classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: enhancement New feature or request.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants