Skip to content

feat(spotlight): pivot to error-rate-per-value + drop tautological attrs#55

Merged
coccyx merged 3 commits into
masterfrom
feat/spotlight-error-rate
May 29, 2026
Merged

feat(spotlight): pivot to error-rate-per-value + drop tautological attrs#55
coccyx merged 3 commits into
masterfrom
feat/spotlight-error-rate

Conversation

@coccyx
Copy link
Copy Markdown
Contributor

@coccyx coccyx commented May 29, 2026

Summary

Three iterations of share-differential visualizations all got the same feedback: "I don't understand what this is showing me." The Operations table reads "Charge: 14% error rate" — one number per row, instantly. Spotlight was making the user combine three numbers (sel share / base share / diff) to reach the same insight. Wrong primitive.

Pivot Spotlight to use the same primitive as the Operations table, generalized to every attribute.

For each attribute value: render a horizontal bar whose width is the error rate (or selection rate in unscoped contexts). Above the bar: value name + percentage. Below: "N total · X errors." Click Search → drill into matching spans.

Also drops response-side attributes (rpc.grpc.status_code, http.response.status_code, response_flags, error.type, exception.type) that 1:1 correlate with the error selection and just produce uninformative 100% bars.

Also drops the curated SVC_SPOTLIGHT_ATTRS short-list on Service Detail. The concurrency-cap-at-4 semaphore in api/search.ts already handles the cluster job limit; the curated list was just leaving real signal on the floor.

What you'll see

Errors page expansion of a product-catalog / GetProduct row with productCatalogFailure on:

Spotlight — error rate per attribute for product-catalog / GetProduct
app.product.id (overall 23% errors)

OLJCESPC7Z [████████████████] 100% — 642 total · 642 errors [Search →]
12345 [████████████████] 100% — 16 total · 16 errors [Search →]
66VCHSJNUP [ ] 0% — 559 total · 0 errors [Search →]
0PUK6V6EV0 [ ] 0% — 374 total · 0 errors [Search →]

The smoking gun (the product ID targeted by the flag) is immediately visible.

Architecture

  • Engine rewritten: selectionRate = selN / total per value; ranking by max(|rate - overall|) * log1p(total).
  • New selectionNoun prop on SpotlightPanel / SpotlightSection — "errors" on Service Detail / Errors, "matching" on Traces.
  • Added name and kind as queryable top-level span columns via attrValueExpr(). The OTel span operation name lives in the bare name column, not under attributes['name'].
  • SpotlightHistogram component deleted — the chart was the wrong primitive.
  • Response-side tautological attrs removed from SPOTLIGHT_ATTRIBUTES.
  • SVC_SPOTLIGHT_ATTRS removed; Service Detail uses the full broad list.

Tests

  • Engine: 11 new test cases for the rate metric + variance score + uniform-attribute filtering + tautology handling.
  • Queries: new test for top-level column resolution (name, kind).
  • Attribute list: explicit asserts that the tautological attrs are excluded.
  • 107/107 total passing.

Test plan

  • npx tsc --noEmit — clean
  • npm run lint — 0 errors
  • npm test — 107/107 passing
  • npm run deploy — packed + uploaded + provisioned on staging
  • Manual validation with paymentFailure + productCatalogFailure scenarios

🤖 Generated with Claude Code

@coccyx coccyx changed the base branch from feat/spotlight-readable-results to master May 29, 2026 20:55
coccyx and others added 3 commits May 29, 2026 13:56
After the scoped-baseline fix landed, manual feedback was: "I see
one thing — rpc.grpc.status_code — and I still don't understand
how to read this or how it helps me find things."

The chart-only design hid the actual information (value names,
counts, percentages) behind a click. Visual asymmetry told users
"something differs" without telling them what, or what to do
about it.

Each attribute card now leads with words and numbers; the chart
is a scanning aid, not the primary readout.

- TL;DR headline sentence above the chart. Picks the strongest
  single value (largest |diff|) and writes it out:
  "Selection over-represents `13` by +100.0 pp (144 sel vs 0
  base)." Reads cold without translating bars.
- Inline value rows below the chart — top 3 by default — with
  per-row counts, percentages, and a dedicated "Search →" button
  that drills into matching spans. No click required to see the
  substance.
- Per-value Search button is explicit and single-purpose. Action
  is unambiguous.
- Plain-English legend at the panel bottom: "selection (the
  spans you're investigating)" / "baseline (what they're being
  compared against)". The earlier "sel" / "base" jargon assumed
  background the novice user doesn't have.
- "Show N more values" toggle for the long tail (>3 values).

Same engine, same scoped-baseline data — purely a readability
change. SpotlightSection (Errors, Service Detail) inherits.

Validated on staging with paymentFailure 50%:
- rpc.grpc.status_code differential now reads as
  "Selection under-represents `0` by -100.0 pp (0 sel vs 905 base)"
  with the two value rows visible inline (`0` -100pp, `13` +100pp),
  each with its Search button.

Pre-merge: tsc clean, lint 0 errors, 104/104 unit tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After three iterations of share-differential visualizations, the
feedback stayed the same: "I don't understand what this is showing
me or how to use it to find things." The Operations table on
Service Detail shows "PaymentService/Charge: 14% error rate" — one
number per row, instantly readable. Spotlight was making the user
mentally combine sel share / base share / diff to recover the same
insight. Wrong primitive.

Pivot Spotlight to use the SAME primitive as the Operations table,
generalized to every attribute: per-value selection rate.

For each attribute value:
- selN — spans with this value that ARE in the selection
- baseN — spans with this value that are NOT in the selection
- total — selN + baseN
- selectionRate — selN / total (the headline metric)

The card renders one row per value: value label, horizontal bar
whose width IS the rate, the percentage, "N total · X errors"
caption, and a per-value Search → button. Above-average rows get
an accent tint; below get muted. No histograms, no overlaid
distributions, no sel/base/diff jargon.

Selection wording is context-aware via the new `selectionNoun`
prop: "98% errors" on Service Detail / Errors pages instead of
generic "98% selection rate."

Score = max over rows of |row.rate - overall.rate| * log1p(total).
The L∞ norm with volume weighting captures the case where a
single tiny-but-extreme value is the real signal — the "one pod
is broken" pattern that volume-weighted variance under-counted.

Added `name` (and `kind`) as queryable attributes via a new
attrValueExpr() helper. The OTel span operation name lives as a
top-level column, not under attributes['name'] — without this fix
Spotlight was blind to "which operation is failing," the most
natural Service Detail differentiator.

Result on staging with paymentFailure 50%:
- Errors page expansion of frontend / POST /api/checkout shows
  http.status_code with rows "500: 50% (1,610 total · 805
  errors)" and "200: 0% (1,636 total · 0 errors)" — bars +
  percentages make the insight immediate.
- Service Detail on payment shows `name` with the failing op
  (PaymentService/Charge) called out with its error rate.

Engine rewritten; SpotlightHistogram component removed (chart was
the wrong primitive; bars are direct). 11 new engine test cases
covering the rate metric + variance score; existing 95 still pass.
Total: 106/106.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop response-side attrs that 1:1 correlate with the error
selection and produce uninformative 100% bars:
  - rpc.grpc.status_code
  - http.response.status_code, http.status_code
  - response_flags, error.type, exception.type

Drop the SVC_SPOTLIGHT_ATTRS curated list on Service Detail. The
original rationale was avoiding the cluster's 20-concurrent-job
ceiling, but the concurrency-cap-at-4 semaphore in api/search.ts
already handles that. The curated list was leaving real signal
on the floor — input-side attributes like app.product.id that
surface the actual cause of scenario-driven failures.

Service Detail (page-level + per-op expansion) now scans the same
SPOTLIGHT_ATTRIBUTES set as Errors and Traces.

Tests updated: the SPOTLIGHT_ATTRIBUTES content asserts now check
for request-shape attrs and explicitly assert the tautological
attrs are excluded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coccyx coccyx force-pushed the feat/spotlight-error-rate branch from d721984 to be74fdb Compare May 29, 2026 20:56
@coccyx coccyx changed the title feat(spotlight): pivot to error-rate-per-value (same primitive as Operations table) feat(spotlight): pivot to error-rate-per-value + drop tautological attrs May 29, 2026
@coccyx coccyx merged commit 11fc7f2 into master May 29, 2026
3 checks passed
@coccyx coccyx deleted the feat/spotlight-error-rate branch May 29, 2026 20:57
coccyx added a commit that referenced this pull request May 29, 2026
- Item #2 (Faceted trace search) is now DONE — folded into the
  v0.9.0 Faceted-search + Spotlight thread (PRs #46#55) with a
  short summary block under Completed.
- Item #13 (Settings page cleanup) is DONE — PR #56 (this branch).
- Renumbered items 3–12 down by one to close the gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
coccyx added a commit that referenced this pull request May 29, 2026
… (ROADMAP #13) (#56)

* feat(settings): reorganize page into grouped sections with sticky nav

ROADMAP item #13. Pure UX rearrangement — no new features, no
schema or query changes.

The page had grown to 8 sections in chronological-by-landing
order rather than mental-model order. Setup actions were buried
at the bottom; first-time installers had to scroll past 5 tuning
sections to find provisioning. No section nav, no setup-status
summary. Trace originators (a read-only audit table) was mixed
with config-action sections. Noise filters and Error filtering
were separated by Dataset even though they're the same kind of
"shape what counts as an error" tuning.

What's now in place:

- Setup status card at the top — two checkmark rows
  (Scheduled searches / Dataset acceleration) showing live
  provisioning state. Green-left-border when both OK; otherwise
  shows what's missing with a "Jump to X" anchor button. Reuses
  the same planOnly + getDatasetStatus checks the global
  ProvisioningBanners use.

- Two-column layout: sticky left-rail section nav (with
  IntersectionObserver-driven active-link highlight) and the
  content cards on the right. Collapses to single column below
  960px.

- Sections grouped by purpose with group headings:
  - Setup: Provisioning, Dataset acceleration (TOP).
  - Workspace: Dataset, Detection cadence, Notification targets.
  - Filtering & heuristics: Noise filters, Error filtering.
  - Diagnostics: Trace originators (collapsed by default).

- Setup panels moved from the very bottom to the top. The
  duplicate at the bottom is removed.

- Trace originators collapsed by default — operators rarely look
  at the classification audit; the section title is now a toggle.

New files: SettingsSetupStatus.tsx (+css), SettingsNav.tsx (+css).

Pre-merge: tsc clean, lint 0 errors, 107/107 unit tests, build
succeeds, deployed + visually validated on staging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(roadmap): mark Faceted search + Settings cleanup as shipped

- Item #2 (Faceted trace search) is now DONE — folded into the
  v0.9.0 Faceted-search + Spotlight thread (PRs #46#55) with a
  short summary block under Completed.
- Item #13 (Settings page cleanup) is DONE — PR #56 (this branch).
- Renumbered items 3–12 down by one to close the gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
coccyx added a commit that referenced this pull request May 29, 2026
Faceted trace search + Spotlight + Settings page reorganization.

- Faceted-nav data layer + UI primitives (#46, #47)
- Search-page integration with Spotlight rail (#48, #49)
- Spotlight on Errors page + Service Detail (#50, #51, #53)
- Small-multiples / readable-card / rate-bar Spotlight redesigns
  driven by manual validation feedback (#52, #54, #55)
- Settings page reorganization with Setup status card, sticky
  nav, and grouped sections (#56)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant