Replies: 1 comment 2 replies
-
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

Uh oh!
There was an error while loading. Please reload this page.
-
Following up on @Yicong-Huang's suggestion on PR #5098. This post documents the entire profiler implementation from the hackathon submission, at the level of detail needed to decide what to land and in what order.
What it does
A feedback layer over Texera's existing per-operator runtime stats. While a workflow runs:
Design principle: read-only consumer of data we already produce. No new event types, no engine changes, no new HTTP endpoints in the critical path. Removing the profiler touches zero backend code paths.
Architecture
OperatorStatisticsUpdateEvent (existing)
│
▼
WorkflowStatusService (existing)
│
▼
ProfilerService ── reactive state (enabled, view, threshold, scores, baseline)
│
├── profiler-config (round-trip via WorkflowContent.profilerConfig)
├── profiler-hints (pure 6-rule engine)
├── profiler-delta (baseline diffing)
├── profiler-history (server-fetched past runs → BaselineReport)
├── profiler-report (MD + JSON export)
└── profiler-suggestions (optional ghost overlays)
│
▼
joint-ui (canvas color) · menu (controls) · property-panel (metrics + hints)
Components
Test totals on the branch: ~250 Vitest specs (all green), tsc --noEmit clean, ScalaTest covers the scoring helper.
Things I'd like input on
Per-workflow config storage: one optional field on WorkflowContent. OK, or prefer a side table / user setting?
Backend scoring helper: include ProfilerScoring.scala now for parity, or drop until there's a call site?
Ghost suggestions vs agent: they cover overlapping use cases (the agent integration, deferred to a separate RFC, has similar Apply/Reject cards). Land both as complementary, or pick one?
Schema versioning: baseline JSON = exported report JSON. Add an explicit schemaVersion before merging, or trust the defensive parser?
Hint i18n: messages are English-only. Route through i18n now or defer?
Recompute scale: tested up to ~80 operators with no issue, no formal benchmark. Upper bound we care about?
Proposed merge order
7 independent PRs, each useful on its own:
Heatmap foundation: ProfilerService + config + canvas coloring + Execution-Settings controls + legend. (~1.2k LOC w/ tests)
Hints + property-panel metrics + threshold slider.
UX polish: displayName threading, hover tooltip, MD/JSON report download.
Per-workflow config round-trip.
Compare-across-runs + delta heatmap.
(optional) Ghost suggestions.
(optional, backend) ProfilerScoring.scala.
Agent integration -> separate RFC once 1–5 are in.
Happy to open PR 1 once there's directional agreement here. Specific yes/no on "Things I'd like input on" would unblock the most.
Beta Was this translation helpful? Give feedback.
All reactions