Skip to content

feat(service-detail): Spotlight section + per-op expansion (PR I)#51

Merged
coccyx merged 1 commit into
masterfrom
feat/spotlight-service-detail
May 29, 2026
Merged

feat(service-detail): Spotlight section + per-op expansion (PR I)#51
coccyx merged 1 commit into
masterfrom
feat/spotlight-service-detail

Conversation

@coccyx
Copy link
Copy Markdown
Contributor

@coccyx coccyx commented May 29, 2026

Summary

PR I — last of the v0.9.0 faceted-nav thread (siblings: #46, #47, #48, #49, #50). Brings the Honeycomb-BubbleUp-style differential view to Service Detail at two levels.

What changes for the user

1. Service-level Spotlight section

A new "Spotlight" section sits between the health charts and the Operations table. Selection = error spans of this service. Baseline = the rest of the time window.

Answers "why is this service unhealthy?" in one glance — surfaces overrepresented operations, pods, HTTP routes, RPC methods, etc.

2. Per-operation Spotlight expansion

Each Operations table row gets a chevron (▶). Clicking it expands a Spotlight strip scoped to that operation. Useful for drilling into a single op when its error rate or p95 stands out.

Click any value in either Spotlight to drill into the Traces page pre-seeded with the service / operation / lookback / filter.

Engineering wins

  • Concurrency limit on streaming queries. Service Detail already fires ~15 queries of its own (RED metrics, time series, status mix, ops, instances, dependencies). Unconditionally fanning out 22 more Spotlight queries blew past the cluster's 20-concurrent-job ceiling and the tail returned 429s. The streaming helpers in `api/search.ts` now cap parallelism at 4 via a small semaphore. Per-attr streaming UX is unchanged — attrs still appear one by one, just paced.

  • Curated attribute subset prop on ``. Embedded surfaces can pass an `attributes` prop with a tighter 8-attr list instead of the full 22 from `SPOTLIGHT_ATTRIBUTES`. Service Detail uses: `http.response.status_code`, `http.status_code`, `http.request.method`, `http.route`, `rpc.method`, `rpc.grpc.status_code`, `k8s.pod.name`, `response_flags` — the attributes most likely to differentiate failing vs healthy spans on a single service.

Screenshots

Service-level Spotlight on the frontend service (no 429s after concurrency fix):
00

`http.status_code 500` ranks at +99.1% in the failing-span selection.

Per-operation expansion with rpc.method, http.route, k8s.pod.name differentials:
01

The `k8s.pod.name` differential is the killer signal here — if one pod dominates the failure spans, that's the smoking gun.

Test plan

  • `npx tsc --noEmit` — clean
  • `npm run lint` — 0 errors
  • `npm test` — 101/101 passing
  • `npm run deploy` — packed + uploaded + provisioned on staging
  • Playwright capture spec succeeds (service-level + per-op)
  • Health charts populate cleanly (concurrency limit verified)

Validate on staging

  1. Open Services → click any service (try `frontend` or `checkoutservice` for richest signal)
  2. Verify the Spotlight section appears between the health charts and the Operations table
  3. Confirm health charts populate (no 429 errors visible)
  4. Click any value in the service-level Spotlight → navigates to Traces pre-filtered
  5. In the Operations table, click any row's chevron — Spotlight expands inline
  6. Click a value in the per-op Spotlight — Traces opens with service + operation + filter

v0.9.0 readiness

After this PR lands the v0.9.0 faceted-nav thread is complete:

Tag v0.9.0 once this merges and CI is green.

Session log

`docs/sessions/2026-05-28-service-spotlight.md`

🤖 Generated with Claude Code

PR I, last of the v0.9.0 faceted-nav thread.

- Service-level Spotlight section between the health charts and the
  Operations table. Selection = error spans of this service.
  Baseline = rest of the time window. Answers "why is this service
  unhealthy?" by surfacing what's overrepresented in the failing
  spans — status codes, pods, RPC methods, routes.

- Per-operation Spotlight expansion. Each Operations table row gets
  a chevron; clicking expands a Spotlight strip scoped to that op
  (service.name + name). Useful for drilling into a single op when
  its error rate or p95 stands out.

- Concurrency limit on the streaming Spotlight queries. Service
  Detail already fires ~15 queries (RED metrics, time series,
  status mix, ops, instances, dependencies). Unconditionally
  fanning out 22 more Spotlight queries blew past the cluster's
  20-concurrent-job ceiling and the tail returned 429s. The
  streaming helpers in api/search.ts now cap parallelism at 4 via
  a small semaphore. Per-attr streaming UX is unchanged.

- Curated attribute subset prop on <SpotlightSection>. Embedded
  surfaces can pass an `attributes` prop with a tighter 8-attr
  list instead of the full 22 from SPOTLIGHT_ATTRIBUTES. Service
  Detail uses {http.response.status_code, http.status_code,
  http.request.method, http.route, rpc.method, rpc.grpc.status_code,
  k8s.pod.name, response_flags} — the attributes most likely to
  differentiate failing vs healthy spans on a single service.

Validated on staging: frontend service detail loads without 429s,
service-level Spotlight surfaces http.status_code 500 at +99.1%
in the failing selection. Per-op expansion shows k8s.pod.name
differential — the killer signal for "is one pod broken?"

Pre-merge: tsc clean, lint 0 errors (1 pre-existing warning),
101/101 unit tests, build green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coccyx coccyx merged commit 49d6dfd into master May 29, 2026
3 checks passed
@coccyx coccyx deleted the feat/spotlight-service-detail branch May 29, 2026 07:06
coccyx added a commit that referenced this pull request May 29, 2026
Faceted trace search + Spotlight + Settings page reorganization.

- Faceted-nav data layer + UI primitives (#46, #47)
- Search-page integration with Spotlight rail (#48, #49)
- Spotlight on Errors page + Service Detail (#50, #51, #53)
- Small-multiples / readable-card / rate-bar Spotlight redesigns
  driven by manual validation feedback (#52, #54, #55)
- Settings page reorganization with Setup status card, sticky
  nav, and grouped sections (#56)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant