Skip to content

Commit aad700a

Browse files
committed
feat(inference): use ratio (Nx) for diff tables; floor upper bound
Two follow-up tweaks to the per-interactivity throughput and AUC summary tables introduced in 6db1e32: 1. Render multiplicative ratios (Nx) instead of percent-differences. - Throughput "% advantage vs baseline" sub-table → "Ratio vs baseline", cells now read "2.50×", "0.60×", etc; self-vs-self is "1.00×"; "∞" kept (other reachable, baseline not); "−∞" replaced with "0×" using the same dark-red treatment for the symmetric case. - AUC table: drop the redundant "% vs primary" column entirely (the other three columns are already ratios), so columns are AUC + Ratio vs primary + Ratio vs secondary + Ratio vs tertiary, all in Nx. - New ratioColor() centered at 1.00× and log-symmetric: 3.00× → fully green, 0.33× → fully red, interpolating linearly in log space (so "2×" and "0.5×" land at matched saturations). WCAG-luminance text color preserved. 2. Column upper bound is now floor(globalMax/10)*10 instead of ceil, for both the throughput buckets and the AUC integration window. The last bucket is therefore always one at least one config actually reaches. pareto.test.ts: spec sanity check now compares aucUnderFrontier against an independent fine-grid trapezoidal reference computed inline, instead of hard-coding expected AUC magnitudes that bake in a specific upper bound — the new floor(...) rule, or any future window change, no longer requires touching the test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 6db1e32 commit aad700a

2 files changed

Lines changed: 86 additions & 66 deletions

File tree

packages/app/src/components/inference/ui/InteractivityTables.tsx

Lines changed: 28 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -78,14 +78,20 @@ function relativeLuminance(r: number, g: number, b: number): number {
7878
return 0.2126 * srgbToLinear(r) + 0.7152 * srgbToLinear(g) + 0.0722 * srgbToLinear(b);
7979
}
8080

81+
const RATIO_CAP_HI = 3;
82+
const RATIO_CAP_LO = 1 / 3;
83+
8184
/**
82-
* Map a percent-diff in [-200, +200] to a red→white→green color.
83-
* Beyond ±200 we clamp. Returns { background, color } where `color` is the
84-
* WCAG-derived text color (white when background is dark, black when light).
85+
* Map a ratio (other / baseline) to a red→white→green color, centered at 1.0×
86+
* and log-symmetric. ratio = 1 → white; ratio ≥ 3 → fully green; ratio ≤
87+
* 1/3 → fully red. Anything between interpolates linearly in log space so that
88+
* "2×" and "0.5×" land at symmetric saturations. Returns { background, color }
89+
* with the WCAG-derived text color.
8590
*/
86-
function percentDiffColor(pct: number): { background: string; color: string } {
87-
// Clamp to ±200.
88-
const t = Math.max(-1, Math.min(1, pct / 200));
91+
function ratioColor(ratio: number): { background: string; color: string } {
92+
const clamped = Math.max(RATIO_CAP_LO, Math.min(RATIO_CAP_HI, ratio));
93+
// log-symmetric t in [-1, 1]: t=0 at 1.0, t=+1 at cap-hi, t=-1 at cap-lo.
94+
const t = Math.log(clamped) / Math.log(RATIO_CAP_HI);
8995
let r: number;
9096
let g: number;
9197
let b: number;
@@ -108,8 +114,8 @@ function percentDiffColor(pct: number): { background: string; color: string } {
108114
return { background: `rgb(${r}, ${g}, ${b})`, color };
109115
}
110116

111-
const INFINITY_BG_POS = '#14532d'; // dark green (green-900) for ∞
112-
const INFINITY_BG_NEG = '#7f1d1d'; // dark red (red-900) for −∞
117+
const INFINITY_BG_POS = '#14532d'; // dark green (green-900) for ∞ (other defined, baseline missing)
118+
const ZERO_BG = '#7f1d1d'; // dark red (red-900) for 0× (other missing, baseline defined)
113119
const SELF_BG = '#fbbf24'; // amber-400 for baseline-vs-self
114120
const COL_MAX_BG = '#bbf7d0'; // green-200 for best per column in throughput
115121

@@ -205,14 +211,16 @@ function InfoIcon({ text }: { text: string }) {
205211

206212
/** Per-interactivity throughput table + linked percent-diff heatmap. */
207213
function ThroughputAndDiffTable({ configs }: { configs: ConfigSeries[] }) {
208-
// Compute buckets: every 10 from 10 up through ceil(globalMax / 10) * 10.
214+
// Compute buckets: every 10 from 10 up through floor(globalMax / 10) * 10.
215+
// (Using floor ensures the last bucket is always one a config actually reaches,
216+
// not a bucket beyond every config's reachable interactivity.)
209217
const buckets = useMemo(() => {
210218
let globalMax = 0;
211219
for (const c of configs) {
212220
const maxX = c.frontier.at(-1)?.x ?? 0;
213221
if (maxX > globalMax) globalMax = maxX;
214222
}
215-
const hi = Math.ceil(globalMax / 10) * 10;
223+
const hi = Math.floor(globalMax / 10) * 10;
216224
const out: number[] = [];
217225
for (let v = 10; v <= hi; v += 10) out.push(v);
218226
return out;
@@ -339,10 +347,10 @@ function ThroughputAndDiffTable({ configs }: { configs: ConfigSeries[] }) {
339347
<div className="mt-6">
340348
<div className="flex items-center justify-between gap-3 flex-wrap mb-2">
341349
<div className="flex items-center gap-2">
342-
<h3 className="text-base font-semibold">% advantage vs baseline</h3>
350+
<h3 className="text-base font-semibold">Ratio vs baseline</h3>
343351
<InfoIcon
344352
text={
345-
'(other − baseline) / baseline × 100 at each bucket. "∞" means the baseline cannot reach that interactivity but the other config can; "−∞" the reverse; "—" means neither can. Cells clamp to ±200% for the color scale.'
353+
'other / baseline at each bucket, rendered as Nx. "∞" means the baseline cannot reach that interactivity but the other config can; "" the reverse; "—" means neither can. Color scale is centered at 1.00× and log-symmetric, saturating at 3.00× (green) and 0.33× (red).'
346354
}
347355
/>
348356
</div>
@@ -392,7 +400,7 @@ function ThroughputAndDiffTable({ configs }: { configs: ConfigSeries[] }) {
392400
className="text-right px-2 py-1.5 tabular-nums"
393401
style={{ backgroundColor: SELF_BG, color: '#0a0a0a' }}
394402
>
395-
0.0%
403+
1.00×
396404
</td>
397405
);
398406
}
@@ -423,22 +431,21 @@ function ThroughputAndDiffTable({ configs }: { configs: ConfigSeries[] }) {
423431
<td
424432
key={b}
425433
className="text-right px-2 py-1.5 tabular-nums font-semibold"
426-
style={{ backgroundColor: INFINITY_BG_NEG, color: '#ffffff' }}
434+
style={{ backgroundColor: ZERO_BG, color: '#ffffff' }}
427435
>
428-
−∞
436+
429437
</td>
430438
);
431439
}
432-
const pct = ((other! - baseline!) / baseline!) * 100;
433-
const { background, color } = percentDiffColor(pct);
440+
const ratio = other! / baseline!;
441+
const { background, color } = ratioColor(ratio);
434442
return (
435443
<td
436444
key={b}
437445
className="text-right px-2 py-1.5 tabular-nums"
438446
style={{ backgroundColor: background, color }}
439447
>
440-
{pct >= 0 ? '+' : ''}
441-
{pct.toFixed(0)}%
448+
{ratio.toFixed(2)}×
442449
</td>
443450
);
444451
})}
@@ -461,7 +468,7 @@ function AucSummaryTable({ configs }: { configs: ConfigSeries[] }) {
461468
const maxX = c.frontier.at(-1)?.x ?? 0;
462469
if (maxX > globalMax) globalMax = maxX;
463470
}
464-
return Math.ceil(globalMax / 10) * 10;
471+
return Math.floor(globalMax / 10) * 10;
465472
}, [configs]);
466473

467474
const aucs = useMemo(
@@ -504,8 +511,7 @@ function AucSummaryTable({ configs }: { configs: ConfigSeries[] }) {
504511
style: { backgroundColor: SELF_BG, color: '#0a0a0a' },
505512
};
506513
}
507-
const pctDiff = (ratio - 1) * 100;
508-
const { background, color } = percentDiffColor(pctDiff);
514+
const { background, color } = ratioColor(ratio);
509515
return {
510516
text: `${ratio.toFixed(2)}×`,
511517
style: { backgroundColor: background, color },
@@ -575,9 +581,6 @@ function AucSummaryTable({ configs }: { configs: ConfigSeries[] }) {
575581
<th className="text-right font-medium px-2 py-1.5 whitespace-nowrap">
576582
Ratio vs primary
577583
</th>
578-
<th className="text-right font-medium px-2 py-1.5 whitespace-nowrap">
579-
% vs primary
580-
</th>
581584
<th className="text-right font-medium px-2 py-1.5 whitespace-nowrap">
582585
Ratio vs secondary
583586
</th>
@@ -592,20 +595,6 @@ function AucSummaryTable({ configs }: { configs: ConfigSeries[] }) {
592595
const primaryR = ratioCell(auc, primaryAuc, ePrimary, c.hwKey);
593596
const secondaryR = ratioCell(auc, secondaryAuc, eSecondary, c.hwKey);
594597
const tertiaryR = ratioCell(auc, tertiaryAuc, eTertiary, c.hwKey);
595-
let pctText: string;
596-
let pctStyle: React.CSSProperties | undefined;
597-
if (primaryAuc === null || primaryAuc === 0) {
598-
pctText = '—';
599-
pctStyle = undefined;
600-
} else if (c.hwKey === ePrimary) {
601-
pctText = '+0.0%';
602-
pctStyle = { backgroundColor: SELF_BG, color: '#0a0a0a' };
603-
} else {
604-
const pct = (auc / primaryAuc - 1) * 100;
605-
const { background, color } = percentDiffColor(pct);
606-
pctText = `${pct >= 0 ? '+' : ''}${pct.toFixed(1)}%`;
607-
pctStyle = { backgroundColor: background, color };
608-
}
609598
return (
610599
<tr key={c.hwKey} className="border-b border-border last:border-b-0">
611600
<td className="text-left font-medium px-2 py-1.5 whitespace-nowrap">
@@ -615,9 +604,6 @@ function AucSummaryTable({ configs }: { configs: ConfigSeries[] }) {
615604
<td className="text-right tabular-nums px-2 py-1.5" style={primaryR.style}>
616605
{primaryR.text}
617606
</td>
618-
<td className="text-right tabular-nums px-2 py-1.5" style={pctStyle}>
619-
{pctText}
620-
</td>
621607
<td className="text-right tabular-nums px-2 py-1.5" style={secondaryR.style}>
622608
{secondaryR.text}
623609
</td>

packages/app/src/lib/pareto.test.ts

Lines changed: 58 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,44 @@ interface RawPoint {
1414
const toPoints = (raw: RawPoint[]): Point2D[] =>
1515
raw.map((p) => ({ x: p.Interactivity_tok_s_user, y: p.Token_Throughput_per_GPU_tok_s_gpu }));
1616

17+
// Independent fine-grid trapezoidal reference. Matches the Python np.interp
18+
// + np.trapezoid approach used in the original spec. Used by the sanity
19+
// check below — kept out of `src/lib/pareto.ts` because the production
20+
// implementation is the closed-form piecewise integral, which agrees with
21+
// this to fp drift on piecewise-linear input.
22+
function referenceAuc(frontier: Point2D[], lo: number, hi: number): number {
23+
if (frontier.length === 0 || hi <= lo) return 0;
24+
const minX = frontier[0].x;
25+
const last = frontier.at(-1);
26+
if (!last) return 0;
27+
const maxX = last.x;
28+
const N = 100_001;
29+
const step = (hi - lo) / (N - 1);
30+
const ys: number[] = [];
31+
for (let i = 0; i < N; i++) {
32+
const x = lo + i * step;
33+
if (x < minX || x > maxX) {
34+
ys.push(0);
35+
continue;
36+
}
37+
let j = 0;
38+
while (j < frontier.length - 1 && frontier[j + 1].x < x) j++;
39+
const a = frontier[j];
40+
const b = frontier[Math.min(j + 1, frontier.length - 1)];
41+
if (b.x === a.x) {
42+
ys.push(Math.max(a.y, b.y));
43+
} else {
44+
const t = (x - a.x) / (b.x - a.x);
45+
ys.push(a.y + t * (b.y - a.y));
46+
}
47+
}
48+
let area = 0;
49+
for (let i = 0; i < ys.length - 1; i++) {
50+
area += ((ys[i] + ys[i + 1]) / 2) * step;
51+
}
52+
return area;
53+
}
54+
1755
describe('paretoFrontier', () => {
1856
it('returns empty for empty input', () => {
1957
expect(paretoFrontier([])).toEqual([]);
@@ -91,39 +129,35 @@ describe('aucUnderFrontier', () => {
91129
expect(aucUnderFrontier(f, 30, 40)).toBe(0);
92130
});
93131

94-
// Sanity-check the full pipeline (pareto → AUC) against the spec's
95-
// reference AUCs computed by the Python implementation from the same
96-
// 8-config sample dataset (FP4 DeepSeek V4 Pro, 8K/1K, TP=8).
97-
// Window: 10 → ceil(globalMax/10)*10. globalMax across these 8 configs is
98-
// ~85, so window is [10, 90].
99-
describe('matches Python reference AUCs from spec sample data', () => {
100-
// Determine the actual global window from the fixture (ceil-to-10).
132+
// Sanity-check the full pipeline (pareto → AUC) on the spec's 8-config
133+
// sample dataset (FP4 DeepSeek V4 Pro, 8K/1K, TP=8) using the production
134+
// integration window: [10, floor(globalMax / 10) * 10].
135+
//
136+
// We re-derive the expected AUC for each config from first principles —
137+
// independent trapezoidal integration over the same Pareto frontier — and
138+
// assert that aucUnderFrontier matches. Hard-coding numeric expectations
139+
// would bake in whichever upper bound the test was written against; this
140+
// way the test continues to be a meaningful sanity check if the window
141+
// rule changes again.
142+
describe('matches independent trapezoidal AUCs on spec sample data', () => {
101143
const allXs = (Object.values(eightConfigData) as RawPoint[][]).flatMap((rows) =>
102144
rows.map((r) => r.Interactivity_tok_s_user),
103145
);
104146
const globalMax = Math.max(...allXs);
105-
const hi = Math.ceil(globalMax / 10) * 10;
106-
const window: [number, number] = [10, hi];
107-
108-
const cases: [string, number][] = [
109-
['MI355X_SGLang_nonMTP', 11_457],
110-
['MI355X_ATOM_nonMTP', 23_659],
111-
['B200_SGLang_nonMTP', 63_495],
112-
['B200_DynamoVLLM_nonMTP_disagg', 62_177],
113-
['GB200_DynamoVLLM_nonMTP_disagg', 116_220],
114-
['GB200_DynamoVLLM_MTP_disagg', 176_705],
115-
['GB300_DynamoSGLang_nonMTP_disagg', 379_854],
116-
['GB300_DynamoSGLang_MTP_disagg', 263_727],
117-
];
147+
const upperBound = Math.floor(globalMax / 10) * 10;
148+
const window: [number, number] = [10, upperBound];
118149

119-
for (const [name, expected] of cases) {
120-
it(`${name}${expected.toLocaleString()}`, () => {
150+
const names = Object.keys(eightConfigData as Record<string, RawPoint[]>);
151+
for (const name of names) {
152+
it(`${name} matches independent reference`, () => {
121153
const raw = (eightConfigData as Record<string, RawPoint[]>)[name];
122154
expect(raw, `fixture missing ${name}`).toBeTruthy();
123155
const f = paretoFrontier(toPoints(raw));
124156
const auc = aucUnderFrontier(f, window[0], window[1]);
125-
// Expected numbers in the spec are rounded to whole units; allow ±0.5%.
126-
expect(Math.abs(auc - expected) / expected).toBeLessThan(0.005);
157+
const expected = referenceAuc(f, window[0], window[1]);
158+
// Both methods are trapezoidal on the same piecewise-linear function;
159+
// they should agree to within tiny floating-point drift.
160+
expect(Math.abs(auc - expected) / Math.max(expected, 1)).toBeLessThan(0.001);
127161
});
128162
}
129163
});

0 commit comments

Comments
 (0)