Add per-interactivity throughput table and AUC summary table to inference page#364
Add per-interactivity throughput table and AUC summary table to inference page#364functionstackx wants to merge 4 commits into
Conversation
… table Below the Pareto chart on the inference page, render two new tables that summarize the visible Pareto-frontier curves into scalar form. - Table 1 (per-GPU throughput at each interactivity bucket): rows = enabled configs, columns = every 10 tok/s/user from 10 up to ceil(globalMax/10)*10. Cells are tok/s/gpu linearly interpolated along each config's Pareto frontier; "—" for out-of-range buckets; best per column highlighted. Linked sub-table shows % advantage vs a user-selectable baseline (default: MI355X SGLang) with infinity / negative-infinity / em-dash semantics and a +/-200%-capped red->white->green heatmap; cell text color picked via WCAG luminance for contrast. - Table 2 (AUC summary): trapezoidal area under each frontier from x=10 to ceil(globalMax/10)*10, with y treated as 0 outside the frontier's x-range. Columns: AUC, ratio + % vs primary baseline (default B200 SGLang non-MTP), ratio vs secondary baseline (default MI355X SGLang), ratio vs tertiary baseline (default MI355X ATOM). All three baselines are selectable. Self-vs-self is amber 1.00x/+0.0%; better is green; worse is red. Both tables share a single Pareto/interp/AUC implementation in @/lib/pareto. Verified against the spec's reference AUCs from eight_config_data.json (FP4 DeepSeek V4 Pro, 8K/1K, TP=8) -- all 8 configs match the expected values to within 0.5%. Tables react live to the existing filter controls (model, precision, ISL/OSL, legend on/off toggles). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This comment has been minimized.
This comment has been minimized.
Two follow-up tweaks to the per-interactivity throughput and AUC summary tables introduced in 6db1e32: 1. Render multiplicative ratios (Nx) instead of percent-differences. - Throughput "% advantage vs baseline" sub-table → "Ratio vs baseline", cells now read "2.50×", "0.60×", etc; self-vs-self is "1.00×"; "∞" kept (other reachable, baseline not); "−∞" replaced with "0×" using the same dark-red treatment for the symmetric case. - AUC table: drop the redundant "% vs primary" column entirely (the other three columns are already ratios), so columns are AUC + Ratio vs primary + Ratio vs secondary + Ratio vs tertiary, all in Nx. - New ratioColor() centered at 1.00× and log-symmetric: 3.00× → fully green, 0.33× → fully red, interpolating linearly in log space (so "2×" and "0.5×" land at matched saturations). WCAG-luminance text color preserved. 2. Column upper bound is now floor(globalMax/10)*10 instead of ceil, for both the throughput buckets and the AUC integration window. The last bucket is therefore always one at least one config actually reaches. pareto.test.ts: spec sanity check now compares aucUnderFrontier against an independent fine-grid trapezoidal reference computed inline, instead of hard-coding expected AUC magnitudes that bake in a specific upper bound — the new floor(...) rule, or any future window change, no longer requires touching the test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Pushed 1. Ratios (Nx) instead of percentages
2.
Test updates
Verification
|
Parameterize pareto.ts with 'higher' | 'lower' direction so the interactivity tables work for cost / J / power metrics in addition to tok/s/gpu. Direction is taken from the existing chart-config roofline direction (upper_* = higher-better, lower_* = lower-better) via new lib/metric-direction.ts helper. - paretoFrontier / interpAlongFrontier / aucUnderFrontier accept a direction parameter. - For lower-is-better, AUC integrates only over each config's reachable x-range (zero-padding outside would treat "no data" as the BEST value, inflating cost AUC). Higher-better keeps the existing zero-outside behavior. - New aucWindow() reports the effective integration window per row, shown as a new "Window" column when the active metric is lower-is-better. - InteractivityTables renders for every y-axis metric; column-best highlight picks min for lower-better; ratio colormap inverts so ratios < 1 are green and > 1 are red; in-range vs out-of-range cells flip their green/red mapping consistently with the direction. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extended to all y-axis metrics
Source-of-truth for directionThe existing chart config ( AUC out-of-range decisionFor lower-is-better metrics, treating out-of-reachable-range as Trade-off: under this asymmetric rule, lower-better AUCs from configs with narrow reachable spans aren't directly comparable to configs with wide spans. To make that explicit, the AUC table gains a new "Window" column (only shown for lower-better metrics) that displays each row's effective
Other changes
Tests
All 1940 unit tests pass. Commit: Files
Notes / deviations
|
The ratio heatmap saturated at 3x, so anything from 5x to 33x collapsed to the same maximum green — common ratios like 7x and 20x looked identical. Bump the log-symmetric saturation caps to 30x / 1/30x and drive the color ramp through HSL (hue=142/0, lightness 0.97→0.28, saturation 0.6→0.78) so 2x / 5x / 10x / 20x land at perceptually distinct greens. Export ratioColor and add unit tests covering distinctness, monotonicity, clamping, log-symmetric reciprocals, lower-better inversion, and text contrast. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Bumped the heatmap saturation caps from 3× / ⅓× to 30× / 1/30× and switched the white→green / white→red ramp from RGB to HSL interpolation (hue=142°/0°, lightness 0.97→0.28, saturation 0.60→0.78). With caps at 30× the log-symmetric position of each common ratio is no longer clamped together, and HSL gives more perceptual contrast across the upper half of the ramp than the prior RGB lerp between green-300 and green-700. New ratio → color mapping
Each consecutive step on the upper half (1.5× → 2× → 5× → 7× → 10× → 20× → 33×) lands at a visibly distinct green; reciprocal ratios are exact mirror images (0.5× ↔ 2×, 0.1× ↔ 10×). Text color flips to white once background luminance drops below 0.45 (unchanged). TestsAdded
All 1947 app unit tests pass; lint, fmt, typecheck clean. |
Summary
Below the existing Pareto-frontier chart on the inference page, render two new tables that summarize the visible Pareto-frontier curves into scalar form. Both tables react live to the same filter controls that drive the chart (model, precision, sequence/ISL-OSL, and the legend on/off toggles for enabled configs), and only appear when the y-axis metric is
Token Throughput per GPU— the AUC + interactivity framing assumes that metric.Table 1 — Per-GPU throughput at each interactivity bucket
ceil(globalMax / 10) * 10.∞ / −∞ / —semantics for missing-other / missing-baseline / both-missing.Table 2 — Area under Pareto frontier (AUC summary)
x = 10tox = ceil(globalMax / 10) * 10. Outside the frontier's x-range the integrand is treated as 0, so configs that don't reach part of the range contribute 0 there.1.00×/+0.0%; better-than-baseline is green; worse is red (same red/green heatmap as Table 1).Implementation notes
packages/app/src/lib/pareto.ts. The existing chart-side roofline code inchart-utils.tsis metric-aware (operates on fullInferenceDatawithupper_left | upper_right | …directions) and intentionally kept untouched — the new util is the plain numeric core that consumers without that machinery (these tables) should use. Both code paths compute the same non-dominated set on(x, y) = (interactivity, tok/s/gpu).np.interpgrid — same answer to machine precision and avoids per-render allocations.useInference().graphs(the existing interactivity chart's processed data), then apply the existingselectedPrecisionsandactiveHwTypesfilters before grouping byhwKey. This is how the table guarantees it always shows exactly the configs that are currently on the chart.Selectistrack()-ed (inference_throughput_baseline_changed,inference_auc_primary_baseline_changed, etc.) per the project's analytics convention.Verification against the spec's reference AUCs
The spec ships an 8-config sample dataset (FP4 DeepSeek V4 Pro, 8K/1K, TP=8) with known expected AUCs computed by the Python reference. The pareto util's unit tests load that fixture and check that all 8 configs match within 0.5% — they all do.
Files
packages/app/src/lib/pareto.ts— new shared util (Pareto frontier, linear interp, trapezoidal AUC).packages/app/src/lib/pareto.test.ts— unit tests including the 8-config sanity check.packages/app/src/lib/__fixtures__/eight_config_data.json— test fixture from the spec.packages/app/src/components/inference/ui/InteractivityTables.tsx— new component containing both tables.packages/app/src/components/inference/ui/ChartDisplay.tsx— mounts the new component below the displayed graphs.Layout
The new section appears as two stacked
Cards below the Pareto chart (and above the "Performance Over Time" drill-down dialog). Each card has a heading row with an info-tooltip and (for the heatmap and AUC tables) baselineSelectcontrols right-aligned. The tables themselves use the dashboard's standard text-xs, tabular-nums, border-collapse, sticky-first-column pattern. No new design system or font is introduced.I could not render a local screenshot in this environment (no DB / no browser), so the layout description above is the best representation I can give.
Test plan
pnpm lintcleanpnpm fmtcleanpnpm typecheckcleanpnpm test:unitclean (1,930 app tests pass, includes 16 new pareto tests)🤖 Generated with Claude Code