Add per-interactivity throughput table and AUC summary table to inference page by functionstackx · Pull Request #364 · SemiAnalysisAI/InferenceX-app

functionstackx · 2026-05-17T21:29:47Z

Summary

Below the existing Pareto-frontier chart on the inference page, render two new tables that summarize the visible Pareto-frontier curves into scalar form. Both tables react live to the same filter controls that drive the chart (model, precision, sequence/ISL-OSL, and the legend on/off toggles for enabled configs), and only appear when the y-axis metric is Token Throughput per GPU — the AUC + interactivity framing assumes that metric.

Table 1 — Per-GPU throughput at each interactivity bucket

Rows: enabled configs. Columns: every 10 tok/s/user from 10 up through ceil(globalMax / 10) * 10.
Cells: tok/s/gpu linearly interpolated along each config's 2-D Pareto frontier of (interactivity, tok/s/gpu). Outside the frontier's x-range: em dash.
Best value per column is highlighted (green background, bold).
Linked sub-table below shows percent advantage of each config vs a user-selectable baseline (default: MI355X SGLang). Cells follow the spec's ∞ / −∞ / — semantics for missing-other / missing-baseline / both-missing.
Heatmap: red → white → green, clamped at ±200%. Text color picked via WCAG relative luminance so each cell stays readable.

Table 2 — Area under Pareto frontier (AUC summary)

AUC = trapezoidal area under each config's Pareto frontier, integrated from x = 10 to x = ceil(globalMax / 10) * 10. Outside the frontier's x-range the integrand is treated as 0, so configs that don't reach part of the range contribute 0 there.
Columns: AUC, Ratio vs primary baseline, % vs primary baseline, Ratio vs secondary baseline, Ratio vs tertiary baseline.
Three independent baseline dropdowns. Defaults: primary = B200 SGLang non-MTP, secondary = MI355X SGLang, tertiary = MI355X ATOM.
Self-vs-self renders amber 1.00× / +0.0%; better-than-baseline is green; worse is red (same red/green heatmap as Table 1).

Implementation notes

Shared 2-D Pareto / interp / AUC implementation in packages/app/src/lib/pareto.ts. The existing chart-side roofline code in chart-utils.ts is metric-aware (operates on full InferenceData with upper_left | upper_right | … directions) and intentionally kept untouched — the new util is the plain numeric core that consumers without that machinery (these tables) should use. Both code paths compute the same non-dominated set on (x, y) = (interactivity, tok/s/gpu).
AUC is computed in closed form on the piecewise-linear frontier rather than as a 10 001-sample np.interp grid — same answer to machine precision and avoids per-render allocations.
Tables source their data from useInference().graphs (the existing interactivity chart's processed data), then apply the existing selectedPrecisions and activeHwTypes filters before grouping by hwKey. This is how the table guarantees it always shows exactly the configs that are currently on the chart.
Each baseline Select is track()-ed (inference_throughput_baseline_changed, inference_auc_primary_baseline_changed, etc.) per the project's analytics convention.
Tooltips/explainers added next to both table headings.

Verification against the spec's reference AUCs

The spec ships an 8-config sample dataset (FP4 DeepSeek V4 Pro, 8K/1K, TP=8) with known expected AUCs computed by the Python reference. The pareto util's unit tests load that fixture and check that all 8 configs match within 0.5% — they all do.

Config	Expected	Computed (within 0.5%)
MI355X SGLang non-MTP	11,457	✓
MI355X ATOM non-MTP	23,659	✓
B200 SGLang non-MTP	63,495	✓
B200 Dynamo vLLM	62,177	✓
GB200 Dynamo vLLM non-MTP	116,220	✓
GB200 Dynamo vLLM MTP	176,705	✓
GB300 Dynamo SGLang non-MTP	379,854	✓
GB300 Dynamo SGLang MTP	263,727	✓

Files

packages/app/src/lib/pareto.ts — new shared util (Pareto frontier, linear interp, trapezoidal AUC).
packages/app/src/lib/pareto.test.ts — unit tests including the 8-config sanity check.
packages/app/src/lib/__fixtures__/eight_config_data.json — test fixture from the spec.
packages/app/src/components/inference/ui/InteractivityTables.tsx — new component containing both tables.
packages/app/src/components/inference/ui/ChartDisplay.tsx — mounts the new component below the displayed graphs.

Layout

The new section appears as two stacked Cards below the Pareto chart (and above the "Performance Over Time" drill-down dialog). Each card has a heading row with an info-tooltip and (for the heatmap and AUC tables) baseline Select controls right-aligned. The tables themselves use the dashboard's standard text-xs, tabular-nums, border-collapse, sticky-first-column pattern. No new design system or font is introduced.

I could not render a local screenshot in this environment (no DB / no browser), so the layout description above is the best representation I can give.

Test plan

pnpm lint clean
pnpm fmt clean
pnpm typecheck clean
pnpm test:unit clean (1,930 app tests pass, includes 16 new pareto tests)
Visual review on a Vercel preview deploy: pick FP4 DeepSeek V4 Pro 8K/1K TP=8 and confirm AUC numbers match the spec table within rounding.
Toggle a config off in the legend and confirm both tables drop that row and the column max / heatmap recompute.
Change the baseline dropdowns and confirm the affected columns recolor and recompute. Self-row remains amber 1.00× / 0%.
Switch the y-axis to a non-throughput metric and confirm the section hides (intended — AUC framing only applies to tok/s/gpu).

🤖 Generated with Claude Code

… table Below the Pareto chart on the inference page, render two new tables that summarize the visible Pareto-frontier curves into scalar form. - Table 1 (per-GPU throughput at each interactivity bucket): rows = enabled configs, columns = every 10 tok/s/user from 10 up to ceil(globalMax/10)*10. Cells are tok/s/gpu linearly interpolated along each config's Pareto frontier; "—" for out-of-range buckets; best per column highlighted. Linked sub-table shows % advantage vs a user-selectable baseline (default: MI355X SGLang) with infinity / negative-infinity / em-dash semantics and a +/-200%-capped red->white->green heatmap; cell text color picked via WCAG luminance for contrast. - Table 2 (AUC summary): trapezoidal area under each frontier from x=10 to ceil(globalMax/10)*10, with y treated as 0 outside the frontier's x-range. Columns: AUC, ratio + % vs primary baseline (default B200 SGLang non-MTP), ratio vs secondary baseline (default MI355X SGLang), ratio vs tertiary baseline (default MI355X ATOM). All three baselines are selectable. Self-vs-self is amber 1.00x/+0.0%; better is green; worse is red. Both tables share a single Pareto/interp/AUC implementation in @/lib/pareto. Verified against the spec's reference AUCs from eight_config_data.json (FP4 DeepSeek V4 Pro, 8K/1K, TP=8) -- all 8 configs match the expected values to within 0.5%. Tables react live to the existing filter controls (model, precision, ISL/OSL, legend on/off toggles). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

vercel · 2026-05-17T21:29:52Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
inferencemax-app	Ready	Preview, Comment	May 17, 2026 10:03pm

Two follow-up tweaks to the per-interactivity throughput and AUC summary tables introduced in 6db1e32: 1. Render multiplicative ratios (Nx) instead of percent-differences. - Throughput "% advantage vs baseline" sub-table → "Ratio vs baseline", cells now read "2.50×", "0.60×", etc; self-vs-self is "1.00×"; "∞" kept (other reachable, baseline not); "−∞" replaced with "0×" using the same dark-red treatment for the symmetric case. - AUC table: drop the redundant "% vs primary" column entirely (the other three columns are already ratios), so columns are AUC + Ratio vs primary + Ratio vs secondary + Ratio vs tertiary, all in Nx. - New ratioColor() centered at 1.00× and log-symmetric: 3.00× → fully green, 0.33× → fully red, interpolating linearly in log space (so "2×" and "0.5×" land at matched saturations). WCAG-luminance text color preserved. 2. Column upper bound is now floor(globalMax/10)*10 instead of ceil, for both the throughput buckets and the AUC integration window. The last bucket is therefore always one at least one config actually reaches. pareto.test.ts: spec sanity check now compares aucUnderFrontier against an independent fine-grid trapezoidal reference computed inline, instead of hard-coding expected AUC magnitudes that bake in a specific upper bound — the new floor(...) rule, or any future window change, no longer requires touching the test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

functionstackx · 2026-05-17T21:43:10Z

Pushed aad700a addressing two requested changes:

1. Ratios (Nx) instead of percentages

Throughput diff sub-table is now "Ratio vs baseline": cells render 2.50× / 0.60× / 1.00× for self. The infinity cases are kept symmetric — ∞ (baseline can't reach this interactivity but the other config can) and 0× (the reverse), both with the same dark-red/dark-green treatment as before. Picked 0× over −∞ because it's the actual numeric limit of other / baseline when other → 0, and it reads more cleanly alongside the other ratio cells.
AUC summary table: dropped the now-redundant "% vs primary" column entirely. The table is now just AUC + Ratio vs primary + Ratio vs secondary + Ratio vs tertiary, all in Nx.
New ratioColor() is centered at 1.00× and log-symmetric: 3.00× → fully green, 0.33× → fully red, interpolating linearly in log space so 2× and 0.5× sit at matched saturations. WCAG-luminance text-color selection preserved.

2. floor instead of ceil for the upper bound

Throughput table buckets and AUC integration window both now end at floor(globalMax / 10) * 10. So if the highest tok/s/user any selected config reaches is e.g. 173.4, columns go 10, 20, …, 170 (not 180), and AUC integrates over [10, 170].

Test updates

pareto.test.ts no longer hard-codes the spec's expected AUC magnitudes — those values bake in a specific upper bound and would have shifted with this change (e.g. B200_DynamoVLLM_nonMTP_disagg goes from 62,177 → 62,194 when hi shifts from 180 → 170, because that config keeps contributing positive area in the (170, 180) window we used to integrate over). The sanity check now compares aucUnderFrontier against an independent fine-grid trapezoidal reference computed inline for each config, so the assertion stays meaningful regardless of which upper-bound rule is in play.

Verification

pnpm lint ✅
pnpm fmt ✅
pnpm typecheck ✅
pnpm test:unit ✅ (1930 tests, all 8 AUC sanity-check cases pass against the independent reference)

Parameterize pareto.ts with 'higher' | 'lower' direction so the interactivity tables work for cost / J / power metrics in addition to tok/s/gpu. Direction is taken from the existing chart-config roofline direction (upper_* = higher-better, lower_* = lower-better) via new lib/metric-direction.ts helper. - paretoFrontier / interpAlongFrontier / aucUnderFrontier accept a direction parameter. - For lower-is-better, AUC integrates only over each config's reachable x-range (zero-padding outside would treat "no data" as the BEST value, inflating cost AUC). Higher-better keeps the existing zero-outside behavior. - New aucWindow() reports the effective integration window per row, shown as a new "Window" column when the active metric is lower-is-better. - InteractivityTables renders for every y-axis metric; column-best highlight picks min for lower-better; ratio colormap inverts so ratios < 1 are green and > 1 are red; in-range vs out-of-range cells flip their green/red mapping consistently with the direction. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

functionstackx · 2026-05-17T21:54:25Z

Extended to all y-axis metrics

InteractivityTables now renders for every y-axis metric (cost, J/token, power, etc.) — not just tok/s/gpu.

Source-of-truth for direction

The existing chart config (packages/app/src/components/inference/inference-chart-config.json) already declares each metric's roofline direction per chart type via y_<metric>_roofline. On the interactivity chart, upper_* is higher-is-better and lower_* is lower-is-better. I added a small shared helper at packages/app/src/lib/metric-direction.ts that maps that direction to a 'higher' | 'lower' ParetoDirection — same data, no duplication. The tables read it directly off the active interactivity chart definition.

AUC out-of-range decision

For lower-is-better metrics, treating out-of-reachable-range as y=0 would inflate AUC because 0 is the BEST cost. I chose to integrate only over each config's reachable x-range (clip the requested [10, hi] to [max(10, configMinX), min(hi, configMaxX)]). For higher-is-better, I kept the existing zero-outside behavior — there, y=0 is the WORST throughput, so zero-padding correctly penalizes configs that can't reach the high-interactivity buckets.

Trade-off: under this asymmetric rule, lower-better AUCs from configs with narrow reachable spans aren't directly comparable to configs with wide spans. To make that explicit, the AUC table gains a new "Window" column (only shown for lower-better metrics) that displays each row's effective lo→hi window.

pareto.aucWindow() is the new helper that returns the effective window so consumers can display it. For higher-better it always returns the requested [lo, hi]; for lower-better it returns the clipped reachable range.

Other changes

paretoFrontier, interpAlongFrontier, aucUnderFrontier all accept a direction: 'higher' | 'lower' parameter (defaulting to 'higher' — fully backward-compatible).
Column-best highlight: max for higher-better, min for lower-better.
Ratio colormap inverts for lower-better (ratios < 1 are green = good, > 1 are red).
∞ / 0× cell coloring flips: for lower-better, ∞ is red (other = infinite cost vs baseline = bad) and 0× is green (other achieves zero cost relative to baseline = great).
Section headers stay generic ("Per-GPU value at each interactivity bucket", "Area under Pareto frontier"); tooltips and the row caption now include a "Higher is better" / "Lower is better" hint.
Numeric formatting auto-scales for small (cost / J/token) values.

Tests

pareto.test.ts adds:

A lower-is-better fixture asserting frontier pruning under inverse dominance.
An aucWindow block covering clip-to-reachable behavior.
A synthetic 3-config cost fixture (cheap / expensive / niche) end-to-end: pareto → window → AUC, with hand-computed expected values.
A duplicate-x interp test verifying lower-better picks min and higher-better picks max.

All 1940 unit tests pass. pnpm lint, pnpm fmt, pnpm typecheck clean. The existing 8-config (real benchmark) integration test is unchanged and still asserts agreement with an independent fine-grid reference for higher-better.

Commit: d5e6abe

Files

packages/app/src/lib/pareto.ts — direction parameter, aucWindow export.
packages/app/src/lib/metric-direction.ts — new shared helper.
packages/app/src/components/inference/ui/InteractivityTables.tsx — direction-aware rendering, removed auto-hide gate.
packages/app/src/components/inference/ui/ChartDisplay.tsx — updated comment on the gate.
packages/app/src/lib/pareto.test.ts — direction tests + synthetic lower-better fixture.

Notes / deviations

The fixture for the lower-better integration test is synthetic. eight_config_data.json only has Token_Throughput_per_GPU_tok_s_gpu per row, so it isn't directly usable for a lower-better metric without duplicating the ETL math; the synthetic fixture cleanly exercises the same code path with hand-checkable expected values.
I did NOT change the AUC behavior for higher-better metrics (zero-padding outside reachable range stays). The existing real-data sanity check continues to pass against the independent reference, so prior numbers in the throughput AUC table are unaffected.

The ratio heatmap saturated at 3x, so anything from 5x to 33x collapsed to the same maximum green — common ratios like 7x and 20x looked identical. Bump the log-symmetric saturation caps to 30x / 1/30x and drive the color ramp through HSL (hue=142/0, lightness 0.97→0.28, saturation 0.6→0.78) so 2x / 5x / 10x / 20x land at perceptually distinct greens. Export ratioColor and add unit tests covering distinctness, monotonicity, clamping, log-symmetric reciprocals, lower-better inversion, and text contrast. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

functionstackx · 2026-05-17T22:03:15Z

Bumped the heatmap saturation caps from 3× / ⅓× to 30× / 1/30× and switched the white→green / white→red ramp from RGB to HSL interpolation (hue=142°/0°, lightness 0.97→0.28, saturation 0.60→0.78). With caps at 30× the log-symmetric position of each common ratio is no longer clamped together, and HSL gives more perceptual contrast across the upper half of the ramp than the prior RGB lerp between green-300 and green-700.

New ratio → color mapping

Ratio	`t = log(r)/log(30)`	HSL L	RGB	Hex (approx)	Text
0.05×	−0.881	0.36	`rgb(162, 22, 22)`	`#a21616`	white
0.1×	−0.677	0.50	`rgb(220, 37, 37)`	`#dc2525`	white
0.5×	−0.204	0.83	`rgb(239, 184, 184)`	`#efb8b8`	black
1.0×	0.000	0.97	`rgb(243, 252, 246)`	`#f3fcf6`	black
1.5×	0.119	0.89	`rgb(209, 244, 222)`	`#d1f4de`	black
2×	0.204	0.83	`rgb(184, 239, 204)`	`#b8efcc`	black
5×	0.473	0.64	`rgb(102, 226, 147)`	`#66e293`	black
7×	0.572	0.58	`rgb(71, 223, 126)`	`#47df7e`	black
10×	0.677	0.50	`rgb(37, 220, 104)`	`#25dc68`	black
20×	0.881	0.36	`rgb(22, 162, 74)`	`#16a24a`	white
33×	1.000	0.28	`rgb(16, 127, 57)`	`#107f39`	white

Each consecutive step on the upper half (1.5× → 2× → 5× → 7× → 10× → 20× → 33×) lands at a visibly distinct green; reciprocal ratios are exact mirror images (0.5× ↔ 2×, 0.1× ↔ 10×). Text color flips to white once background luminance drops below 0.45 (unchanged).

Tests

Added InteractivityTables.test.ts with:

distinct backgrounds for {2×, 5×, 7×, 10×, 20×}
monotonically darker green for higher ratios up to the cap
clamp behavior beyond RATIO_CAP_HI / RATIO_CAP_LO
log-symmetric reciprocal mirror property
direction='lower' hue inversion
text-color switch at deep ratios

All 1947 app unit tests pass; lint, fmt, typecheck clean.

vercel Bot deployed to Preview May 17, 2026 21:29 View deployment

This comment has been minimized.

Sign in to view

vercel Bot deployed to Preview May 17, 2026 21:43 View deployment

vercel Bot deployed to Preview May 17, 2026 21:54 View deployment

vercel Bot deployed to Preview May 17, 2026 22:03 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add per-interactivity throughput table and AUC summary table to inference page#364

Add per-interactivity throughput table and AUC summary table to inference page#364
functionstackx wants to merge 4 commits into
masterfrom
feat/interactivity-throughput-and-auc-tables

functionstackx commented May 17, 2026

Uh oh!

vercel Bot commented May 17, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

functionstackx commented May 17, 2026

Uh oh!

functionstackx commented May 17, 2026

Uh oh!

functionstackx commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

functionstackx commented May 17, 2026

Summary

Table 1 — Per-GPU throughput at each interactivity bucket

Table 2 — Area under Pareto frontier (AUC summary)

Implementation notes

Verification against the spec's reference AUCs

Files

Layout

Test plan

Uh oh!

vercel Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

functionstackx commented May 17, 2026

Uh oh!

functionstackx commented May 17, 2026

Extended to all y-axis metrics

Source-of-truth for direction

AUC out-of-range decision

Other changes

Tests

Files

Notes / deviations

Uh oh!

functionstackx commented May 17, 2026

New ratio → color mapping

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented May 17, 2026 •

edited

Loading