Clarification: Are P95/P99 latency thresholds the gate-keeping criteria for data validity in MLPerf submissions?

The README designates multiple metrics as "PRIMARY" for MLPerf v3.0 submissions (§ Understanding Excel Performance Metrics):

  - Storage Tier Read/Write Device P95 latency
  - Tier Storage Read/Write Bandwidth (GB/s)
  - Avg Throughput (tok/s) when gpu_mem=0, cpu_mem=0

  The benchmark output also reports a PASS/FAIL verdict driven by P95 thresholds (e.g., NVMe Read P95 < 200ms, NVMe Write P95 < 500ms).

  However, it's unclear whether these P95/P99 latency thresholds are:

  1. Validity gates — a run that fails the P95 criterion produces invalid or non-comparable throughput/bandwidth data and must be discarded, or
  2. Diagnostic signals only — a FAIL on P95 is informational and the throughput/bandwidth numbers are still reportable alongside the latency result.

  The README notes: "This is not a pass/fail test. It is a diagnostic tool" (§ What This Benchmark Does), yet the output renders an explicit PASS/FAIL verdict. This creates ambiguity for
  submitters who need to know whether a FAIL on P95 invalidates the entire run.

  Specific questions:

  1. Does a FAIL on P95 latency invalidate the throughput/bandwidth data for official submission purposes?
  2. Should submitters report runs that fail the P95 threshold, or only runs that pass?
  3. Given the high observed variance (CV 50–125%, §Discovery Test Key Findings), is it expected that some of the 3–5 required trials may fail P95 while others pass? If so, is the median
  taken across all trials or only passing trials?

  A definitive answer here would prevent inconsistent submissions across vendors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification: Are P95/P99 latency thresholds the gate-keeping criteria for data validity in MLPerf submissions? #456

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Clarification: Are P95/P99 latency thresholds the gate-keeping criteria for data validity in MLPerf submissions? #456

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions