feat(runtime-capabilities): capability retry endpoint + unavailable-capability UI#213
Conversation
Extended backend and frontend to support retrying runtime capabilities (e.g., `SLOWLOG`) when disabled due to server restrictions. Added `/connections/:id/capabilities/:capability/retry` endpoint, React component `CapabilityUnavailableBanner` for inline warnings, and enhanced state handling in `useCapabilities`. Includes tests and better surfacing of disabled reasons in the UI.
|
@BugBot review |
Bugbot couldn't runBugbot is not enabled for your user on this team. Ask your team administrator to increase your team's hard limit for Bugbot seats or add you to the allowlist in the Cursor dashboard. |
Retry now actually re-runs the capability's command instead of just clearing
the disabled flag, and the SlowLog page surfaces the verdict in stacked
banners over always-rendered content.
- POST /connections/:id/capabilities/:capability/retry now executes the
capability's cheapest test command (SLOWLOG LEN, LATENCY LATEST, etc.)
against the live adapter and returns { available, reason }. Success
re-enables the capability; a blocked-pattern failure re-records the
rejection with the fresh reason; transient errors leave previous state.
- Per-capability test-command map exported as CAPABILITY_TEST_COMMAND from
RuntimeCapabilityTracker for the controller to consume.
- New CapabilityStatusBanner: warn banner (description) + error banner
(verbatim ERR) + left-aligned Retry button. During retry the error
banner shows "Force-retrying X…" for a 3 s window so the action is
perceptible before the (typically sub-100 ms) probe resolves.
- SlowLog page now always renders the table; the unavailable state is
expressed as banners stacked above it instead of a centered overlay
card. Both the full-block-unavailable and per-capability partial cases
use the same banner.
- Extract UnavailableMessage from UnavailableOverlay so other pages
(Clients, AuditTrail, VectorAi, ClientAnalytics) keep the overlay
visual unchanged via composition.
- Remove the now-unused CapabilityUnavailableBanner.
- Fix a stray missing-paren JSX expression in start-session-modal.tsx
that broke type-check.
CLUSTER SLOT-STATS without ORDERBY/LIMIT errors with "wrong number of arguments" — that's not a blocked-command pattern, so the probe was returning available='unknown' on capable servers (false negative). Match the shape used by the live poller (ORDERBY key-count LIMIT 1) so the probe and the polled call exercise the same server path. CLIENT LIST as a probe is O(N) over the connection table. CLIENT GETNAME exercises the same ACL gate at O(1). Also recognize "cluster support disabled" as a definitive rejection so standalone servers correctly report canClusterSlotStats=false instead of 'unknown' on the new probe.
Covers all five outcomes of POST /connections/:id/capabilities/:capability/retry:
- 400 on unknown capability key, 404 on missing connection.
- Success path: returns { available: true } and re-enables a previously-
disabled capability via the tracker.
- Blocked-pattern rejection (Upstash "Command is not available", cluster
support disabled): returns { available: false, reason } and records the
failure with the fresh reason.
- Transient/non-blocked rejection (ECONNRESET, Connection timeout):
returns { available: 'unknown', reason } and leaves prior tracker state
intact — including not re-enabling a previously-disabled capability.
- String (non-Error) rejection values are surfaced through the reason field.
Also asserts the probe call actually sends the command + args from
CAPABILITY_TEST_COMMAND so the map stays in sync with the controller.
…tone The probe can return three verdicts (true / false / 'unknown') but the banner previously rendered the same red "blocked" error regardless. An operator on a flaky connection couldn't tell "server refused MONITOR" from "network blip swallowed the probe" — the former requires a server config change, the latter just another click. - Add CapabilityRetryVerdict to the shared health types and wire it through controller → fetchApi → useCapabilities.retryCapability so callers get the structured verdict back, not just void. - CapabilityStatusBanner tracks the most recent verdict locally. When it was 'unknown' the error slot becomes an amber "Couldn't verify — transient error. Try again." block (HelpCircle icon) with the FRESH reason from the verdict (an ECONNRESET, timeout, etc. — not the stale prop-level disabled-because reason). Definitive 'false' verdicts and initial-render state keep the red error tone. - Extract the 3s perceptible delay into PERCEPTIBLE_RETRY_DELAY_MS. - During retry the banner now shows a dedicated amber spinner row instead of overloading the error banner with the "Force-retrying X…" string. 6 vitest cases cover initial render, in-progress state, the three verdict paths (true unmounts via parent, false stays red, unknown turns amber), and the verdict-reason-vs-prop-reason fallback. All 62 backend tests (tracker + controller + registry) still green.
These changes were IDE auto-format noise picked up earlier in the branch's history; they're not part of the runtime-capability work this PR ships. Restoring the master version keeps the PR diff focused.
…t pin
iovalkey's commandTimeout normally bounds a Redis call, but adapters that
lack one — or that hang BEFORE a command is even queued (TLS handshake on
a dead server, etc.) — would otherwise keep the retry endpoint open
forever. The HTTP client times out, the user retries, and the orphaned
probe is still using up a connection slot.
- New CAPABILITY_PROBE_TIMEOUT_MS = 5000.
- Wrap adapter.call(...) in a small Promise-race helper. On timeout the
rejection ("Probe timed out after 5000ms") flows through the existing
catch branch: pattern doesn't match a blocked-command rule, so the
tracker stays untouched and the verdict is { available: 'unknown',
reason }. Operator gets the amber "couldn't verify" UI added in the
previous commit and can try again.
- New controller test (jest fake timers) covers the hung-probe path and
asserts the prior disabled state is preserved across timeout.
|
@BugBot review |
|
Should be fixed now. Bugbot's vibecoded API had a new issue |
- Derive RUNTIME_CAPABILITY_KEYS from CAPABILITY_TEST_COMMAND so new capabilities are automatically retryable instead of silently missing from the controller's allowlist. - Render a success state in CapabilityStatusBanner when retry yields available: true, so the red error branch no longer briefly flashes the stale disabled-reason while parent refreshCapabilities is in flight.
|
@BugBot review |
When both canSlowLog and canCommandLog were unavailable the page rendered a single SLOWLOG/COMMANDLOG banner whose retry only re-probed canSlowLog, so operators who fixed COMMANDLOG on the server could not re-enable that half from the UI. Split into two stacked banners so each retry targets the correct capability.
|
@BugBot review |
Reset capabilities/reasons synchronously when the active connection changes so SlowLog and other capability banners cannot briefly show another database's disable reason while /health is still in flight.
|
@BugBot review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
2 issues from previous reviews remain unresolved.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 16f2821. Configure here.
- canSlowLog probe now runs SLOWLOG GET 1 (the same subcommand the live poller/UI uses) so a server permitting LEN but blocking GET cannot pass the retry and then fail on the next poll. - retryCapability awaits refreshCapabilities so context flags clear before the verdict resolves, preventing CapabilityStatusBanner from showing success while old disabled flags still gate the page.
|
@BugBot review |
- canCommandLog probe now runs COMMANDLOG GET 1 slow (the poller path) instead of LEN, so a server permitting LEN but blocking GET cannot pass retry and then fail on the next poll. - When both SLOWLOG and COMMANDLOG are unavailable but neither has a probe reason, surface an informational banner (no retry) so the Slow Log page never silently renders empty tables. Single-capability absences without a reason stay silent so Redis-only servers don't see noisy "COMMANDLOG not exposed" banners on every visit.
|
@BugBot review |
Key each CapabilityStatusBanner by the active connection ID so React discards the local lastVerdict state when the user switches connections. Without the key, a banner could carry a prior connection's success or transient retry message into the new connection's UI.
|
@BugBot review |
CapabilityStatusBanner now records the upstream reason alongside each retry verdict and ignores the verdict once /health delivers a different reason. Without this, an amber 'Couldn't verify' state could persist even after health refreshed the definitive disabled reason — masking the fresh detail with the prior transient error.
|
@BugBot review |
…ility) The 5s timeout on withTimeout cannot cancel the underlying adapter.call; without deduplication, repeated retry requests stack new commands on the same Redis connection while earlier probes are still hung. Track each in-flight probe in a per-(connection, capability) map and reuse it for overlapping requests so only one outstanding command exists at a time. Entries clear when the underlying promise settles, regardless of whether the caller already returned 'unknown' on timeout.
|
@BugBot review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit fb509f2. Configure here.

Summary
POST /connections/:id/capabilities/retryendpoint that re-probes a previously-disabled runtime capability (e.g. SLOWLOG on Upstash) and clears the cached "unavailable" state on success.RuntimeCapabilityTrackerwith per-capability disabled-reason + timestamp metadata, surfaced through the health endpoint and sharedhealth.tstypes.CapabilityUnavailableBanner+UnavailableOverlaydriven byuseCapabilities, with a retry action wired into Slow Log and other capability-gated pages.Why draft
Pushed to keep the work alive after master picked up the related
MonitorSupportProbework (#207). This branch is the other half — capability retry / UI affordances — and hasn't been reviewed yet.Test plan
pnpm test --filter @betterdb/api(newruntime-capability-tracker.service.spec.ts)pnpm lintcleandisabledReasonsshape matchingpackages/shared/src/types/health.tsNote
Medium Risk
Touches live connection command execution and health/capability state used across polling and the UI; behavior is well-tested but misclassified errors could flip feature availability.
Overview
Adds manual runtime capability retry and clearer operator UI when Redis/Valkey commands are blocked (e.g. Upstash, ElastiCache).
API: New
POST /connections/:id/capabilities/:capability/retryruns a cheap probe fromCAPABILITY_TEST_COMMAND, dedupes in-flight probes, and applies a 5s timeout so hung calls returnavailable: 'unknown'without changing prior state. Definitive server rejections re-record failure and returnavailable: falsewith the error text; success callsresetCapability.RuntimeCapabilityTrackernow stores per-capability disable reasons + timestamps, recognizes more blocked-error patterns (Upstash “not available”, cluster disabled, etc.), and exposesgetDisabledReasons/resetCapability. Health responses includeruntimeCapabilityReasons(shared types inpackages/shared).Web:
Apploads reasons from health and exposesretryCapabilityvia context. Slow Log replaces the full-page unavailable overlay withCapabilityStatusBanner(verbatim ERR, retry, transient vs blocked styling).UnavailableMessage/UnavailableOverlaygain optional reason + retry for other gated pages.Tests: Jest specs for the controller retry path and tracker; Vitest for the banner.
Reviewed by Cursor Bugbot for commit fb509f2. Bugbot is set up for automated code reviews on this repo. Configure here.