feat(spotlight): pivot to error-rate-per-value + drop tautological attrs#55
Merged
Conversation
After the scoped-baseline fix landed, manual feedback was: "I see one thing — rpc.grpc.status_code — and I still don't understand how to read this or how it helps me find things." The chart-only design hid the actual information (value names, counts, percentages) behind a click. Visual asymmetry told users "something differs" without telling them what, or what to do about it. Each attribute card now leads with words and numbers; the chart is a scanning aid, not the primary readout. - TL;DR headline sentence above the chart. Picks the strongest single value (largest |diff|) and writes it out: "Selection over-represents `13` by +100.0 pp (144 sel vs 0 base)." Reads cold without translating bars. - Inline value rows below the chart — top 3 by default — with per-row counts, percentages, and a dedicated "Search →" button that drills into matching spans. No click required to see the substance. - Per-value Search button is explicit and single-purpose. Action is unambiguous. - Plain-English legend at the panel bottom: "selection (the spans you're investigating)" / "baseline (what they're being compared against)". The earlier "sel" / "base" jargon assumed background the novice user doesn't have. - "Show N more values" toggle for the long tail (>3 values). Same engine, same scoped-baseline data — purely a readability change. SpotlightSection (Errors, Service Detail) inherits. Validated on staging with paymentFailure 50%: - rpc.grpc.status_code differential now reads as "Selection under-represents `0` by -100.0 pp (0 sel vs 905 base)" with the two value rows visible inline (`0` -100pp, `13` +100pp), each with its Search button. Pre-merge: tsc clean, lint 0 errors, 104/104 unit tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After three iterations of share-differential visualizations, the feedback stayed the same: "I don't understand what this is showing me or how to use it to find things." The Operations table on Service Detail shows "PaymentService/Charge: 14% error rate" — one number per row, instantly readable. Spotlight was making the user mentally combine sel share / base share / diff to recover the same insight. Wrong primitive. Pivot Spotlight to use the SAME primitive as the Operations table, generalized to every attribute: per-value selection rate. For each attribute value: - selN — spans with this value that ARE in the selection - baseN — spans with this value that are NOT in the selection - total — selN + baseN - selectionRate — selN / total (the headline metric) The card renders one row per value: value label, horizontal bar whose width IS the rate, the percentage, "N total · X errors" caption, and a per-value Search → button. Above-average rows get an accent tint; below get muted. No histograms, no overlaid distributions, no sel/base/diff jargon. Selection wording is context-aware via the new `selectionNoun` prop: "98% errors" on Service Detail / Errors pages instead of generic "98% selection rate." Score = max over rows of |row.rate - overall.rate| * log1p(total). The L∞ norm with volume weighting captures the case where a single tiny-but-extreme value is the real signal — the "one pod is broken" pattern that volume-weighted variance under-counted. Added `name` (and `kind`) as queryable attributes via a new attrValueExpr() helper. The OTel span operation name lives as a top-level column, not under attributes['name'] — without this fix Spotlight was blind to "which operation is failing," the most natural Service Detail differentiator. Result on staging with paymentFailure 50%: - Errors page expansion of frontend / POST /api/checkout shows http.status_code with rows "500: 50% (1,610 total · 805 errors)" and "200: 0% (1,636 total · 0 errors)" — bars + percentages make the insight immediate. - Service Detail on payment shows `name` with the failing op (PaymentService/Charge) called out with its error rate. Engine rewritten; SpotlightHistogram component removed (chart was the wrong primitive; bars are direct). 11 new engine test cases covering the rate metric + variance score; existing 95 still pass. Total: 106/106. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop response-side attrs that 1:1 correlate with the error selection and produce uninformative 100% bars: - rpc.grpc.status_code - http.response.status_code, http.status_code - response_flags, error.type, exception.type Drop the SVC_SPOTLIGHT_ATTRS curated list on Service Detail. The original rationale was avoiding the cluster's 20-concurrent-job ceiling, but the concurrency-cap-at-4 semaphore in api/search.ts already handles that. The curated list was leaving real signal on the floor — input-side attributes like app.product.id that surface the actual cause of scenario-driven failures. Service Detail (page-level + per-op expansion) now scans the same SPOTLIGHT_ATTRIBUTES set as Errors and Traces. Tests updated: the SPOTLIGHT_ATTRIBUTES content asserts now check for request-shape attrs and explicitly assert the tautological attrs are excluded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d721984 to
be74fdb
Compare
coccyx
added a commit
that referenced
this pull request
May 29, 2026
- Item #2 (Faceted trace search) is now DONE — folded into the v0.9.0 Faceted-search + Spotlight thread (PRs #46–#55) with a short summary block under Completed. - Item #13 (Settings page cleanup) is DONE — PR #56 (this branch). - Renumbered items 3–12 down by one to close the gap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
coccyx
added a commit
that referenced
this pull request
May 29, 2026
… (ROADMAP #13) (#56) * feat(settings): reorganize page into grouped sections with sticky nav ROADMAP item #13. Pure UX rearrangement — no new features, no schema or query changes. The page had grown to 8 sections in chronological-by-landing order rather than mental-model order. Setup actions were buried at the bottom; first-time installers had to scroll past 5 tuning sections to find provisioning. No section nav, no setup-status summary. Trace originators (a read-only audit table) was mixed with config-action sections. Noise filters and Error filtering were separated by Dataset even though they're the same kind of "shape what counts as an error" tuning. What's now in place: - Setup status card at the top — two checkmark rows (Scheduled searches / Dataset acceleration) showing live provisioning state. Green-left-border when both OK; otherwise shows what's missing with a "Jump to X" anchor button. Reuses the same planOnly + getDatasetStatus checks the global ProvisioningBanners use. - Two-column layout: sticky left-rail section nav (with IntersectionObserver-driven active-link highlight) and the content cards on the right. Collapses to single column below 960px. - Sections grouped by purpose with group headings: - Setup: Provisioning, Dataset acceleration (TOP). - Workspace: Dataset, Detection cadence, Notification targets. - Filtering & heuristics: Noise filters, Error filtering. - Diagnostics: Trace originators (collapsed by default). - Setup panels moved from the very bottom to the top. The duplicate at the bottom is removed. - Trace originators collapsed by default — operators rarely look at the classification audit; the section title is now a toggle. New files: SettingsSetupStatus.tsx (+css), SettingsNav.tsx (+css). Pre-merge: tsc clean, lint 0 errors, 107/107 unit tests, build succeeds, deployed + visually validated on staging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(roadmap): mark Faceted search + Settings cleanup as shipped - Item #2 (Faceted trace search) is now DONE — folded into the v0.9.0 Faceted-search + Spotlight thread (PRs #46–#55) with a short summary block under Completed. - Item #13 (Settings page cleanup) is DONE — PR #56 (this branch). - Renumbered items 3–12 down by one to close the gap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
coccyx
added a commit
that referenced
this pull request
May 29, 2026
Faceted trace search + Spotlight + Settings page reorganization. - Faceted-nav data layer + UI primitives (#46, #47) - Search-page integration with Spotlight rail (#48, #49) - Spotlight on Errors page + Service Detail (#50, #51, #53) - Small-multiples / readable-card / rate-bar Spotlight redesigns driven by manual validation feedback (#52, #54, #55) - Settings page reorganization with Setup status card, sticky nav, and grouped sections (#56) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three iterations of share-differential visualizations all got the same feedback: "I don't understand what this is showing me." The Operations table reads "Charge: 14% error rate" — one number per row, instantly. Spotlight was making the user combine three numbers (sel share / base share / diff) to reach the same insight. Wrong primitive.
Pivot Spotlight to use the same primitive as the Operations table, generalized to every attribute.
For each attribute value: render a horizontal bar whose width is the error rate (or selection rate in unscoped contexts). Above the bar: value name + percentage. Below: "N total · X errors." Click Search → drill into matching spans.
Also drops response-side attributes (
rpc.grpc.status_code,http.response.status_code,response_flags,error.type,exception.type) that 1:1 correlate with the error selection and just produce uninformative 100% bars.Also drops the curated
SVC_SPOTLIGHT_ATTRSshort-list on Service Detail. The concurrency-cap-at-4 semaphore inapi/search.tsalready handles the cluster job limit; the curated list was just leaving real signal on the floor.What you'll see
Errors page expansion of a product-catalog / GetProduct row with
productCatalogFailure on:The smoking gun (the product ID targeted by the flag) is immediately visible.
Architecture
selectionRate = selN / totalper value; ranking bymax(|rate - overall|) * log1p(total).selectionNounprop onSpotlightPanel/SpotlightSection— "errors" on Service Detail / Errors, "matching" on Traces.nameandkindas queryable top-level span columns viaattrValueExpr(). The OTel span operation name lives in the barenamecolumn, not underattributes['name'].SpotlightHistogramcomponent deleted — the chart was the wrong primitive.SPOTLIGHT_ATTRIBUTES.SVC_SPOTLIGHT_ATTRSremoved; Service Detail uses the full broad list.Tests
name,kind).Test plan
npx tsc --noEmit— cleannpm run lint— 0 errorsnpm test— 107/107 passingnpm run deploy— packed + uploaded + provisioned on staging🤖 Generated with Claude Code