feat(data_explorer): add no-code visual query builder (alpha)#9
Open
albertogrande wants to merge 12 commits into
Open
feat(data_explorer): add no-code visual query builder (alpha)#9albertogrande wants to merge 12 commits into
albertogrande wants to merge 12 commits into
Conversation
Owner
Author
|
Note for reviewers: the red checks are fork-infra, not test failures — the paths-filter gate jobs need PostHog-internal GitHub App secrets that forks don't have, so their dependent "… Tests Pass" checks fail without running any tests. The backend, Jest, Playwright, tach, and OpenAPI suites all pass locally. |
A column-less dataset compiled a placeholder projection from the supplied primary_key, which AI-built metadata defaults to "id" for every source kind — groups is keyed by `key`, events by `uuid` — so switching sources via the explore_data tool surfaced a "Preview failed" toast over an empty canvas. - compiler: project the kind's canonical row identifier (KIND_PRIMARY_KEYS) instead of trusting the supplied key - max_tools: stamp the canonical key on source change or blank state - scratchViewLogic: skip the preview request entirely when there are no columns (the grid never renders) and dismiss the stale error toast Matches the fixes already verified on the live demo build. https://claude.ai/code/session_01SPyW9ZdmmZ62DEQSGP5sDN
|
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, please remove the |
15 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Data explorer is a no-code builder for tables that span PostHog and warehouse data — joins, rollups, and filters by clicking, saved as a queryable warehouse view. No SQL.
▶ Try it live: https://posthog-demo.fly.dev/ —
demo@posthog-demo.fly.dev/PostHogDemo▶ Demo video
1080-30.mp4
Problem
"Which accounts stopped paying but still use the product?" is a standard growth-review question. Answering it means joining group data, a warehouse
invoicestable, and usage events, per account. Today that's hand-written HogQL — out of reach for the people who actually run those reviews, who think in tables and filters, not SQL.Knowing SQL doesn't make it safe either. Join two one-to-many sources in one query and the rows fan out: every aggregate comes back inflated, and nothing tells you it's wrong.
Changes
A new alpha product in
products/data_explorer/, behindFEATURE_FLAGS.DATA_EXPLORER. You pick a primary source, add columns from it and from related sources, connect warehouse tables on a join key, add filters and per-column time windows, preview live, and save the result as a view.There's no new model. A saved view is a plain
DataWarehouseSavedQuerywith the builder state stored inbuilder_metadata, so it inherits the warehouse's materialization, permissions, and tenancy, and is queryable anywhere HogQL works.Walkthrough — tracking monthly churn
Built entirely by clicking — the demo video above is this exact flow.
1. Pick what each row is. Churn is an account-level question, so pick groups: one row per account.
2. Connect the data. Join
invoices(many rows per account → rollups) andsalesforce(one row per account → details) onkey = account_key. Events are already a related source for groups.3. Add columns. Detail columns join 1:1. Rollups carry a math chip (Sum, Count, Avg, …) and a time window.
name(company)owner,icp_scoreamount_usdmrr_this_monthamount_usdmrr_last_monthexceptions_captured_in_periodexceptions_this_monthdays_with_exceptionsdays_this_monthtimestamplast_active4. Filter to the churn segment. Two rules, ANDed:
mrr_this_month = 0andmrr_last_month > 0— paid last month, nothing this month. Sort by active days descending so still-active accounts float to the top.5. Read the result. Every row churned on billing; the usage columns say why. Active days near zero: they disengaged, then left. Active days near 30 with a high exception count: still using it daily without paying — the win-back list billing data alone can't show you.
6. Save it. The result is a queryable
DataWarehouseSavedQuery— build a cohort from it, chart it in an insight, materialize it. Reopening it lands you back in the builder.…or just ask PostHog AI
The
explore_datatool lets Max drive the same builder. Describe the table in plain English and it picks fields, aggregations, filters, and sort from your connected sources, then hands back the same editable config you'd build by hand — not SQL dropped into chat. For example:Max adds
ownerandicp_scoreas details and a Sum rollup ofexceptions_captured_in_period, sorted descending. Hand-built and AI-built columns go through the same column-assembly module, so you can keep editing where it left off. In the flag-gated builder mode, Max is restricted to this tool: analytical prompts land in the builder, not as SQL in chat.v1 scope: the AI works over sources already connected in the rail. It doesn't author new joins or time windows yet — you set those up once by hand (steps 2–3), then iterate with the AI on top.
Under the hood
Compiler. Pure and ORM-free:
builder_metadatain, HogQLSelectQueryout. Each related source compiles to its own subquery reduced to one row per join key —any()for details, the chosen aggregate for rollups — thenLEFT JOINs back onto the entity. Reducing before joining is what prevents the fan-out: two rollups over two sources can't multiply each other's rows. Time windows compile to conditional aggregation (sumIf/countIf), so "MRR this month" and "MRR last month" share one subquery.Preview and persistence.
POST /api/environments/:id/data_explorer_preview/compiles and executes the config for the capped live grid; withcompile_onlyit returns the uncapped HogQL that's persisted on the view. Connected joins live in the view's ownbuilder_metadata, validated and size-bounded on write — building never touches the global warehouse schema.PostHog AI. The tool (
products/data_explorer/backend/max_tools.py) receives the current builder state plus a catalog of pickable fields, each with a ready-made column config. The model selects by reference; the backend resolves the selection intobuilder_metadataand compile-validates it before the frontend applies it. The model never writes HogQL — that's what keeps every AI-built query valid and editable.Security. All user and AI input is lowered to typed AST constants and identifiers, resolved against the team's own schema, and escaped by the HogQL printer. No string interpolation into SQL anywhere in the path.
How did you test this code?
Manually. Built the walkthrough above end-to-end on the live build (posthog-demo.fly.dev), sanity-checked the numbers, saved the view, and reopened it in the builder. An agent browser QA pass over the same build found no P0/P1 defects.
The claims a reviewer will care about, and how each is verified:
sumIfsubquery.id) OR 1=1 --resolves as a schema identifier and returns a 400, never executed — covered in the preview API tests and reproduced against the live endpoint. The AI tool likewise refuses injection and declines nonexistent fields.In CI: backend tests for the compiler, validation, preview API, and AI tool; Jest for the builder logics and grid; a Playwright end-to-end click-through.
Publish to changelog?
No — alpha, behind a flag.
Docs update
None in this PR (alpha). Add the
skip-inkeep-docslabel.🤖 Agent context
Built with Claude Code; I drove the design and own the code, the compiler most of all. Decisions along the way:
DataWarehouseSavedQuery.builder_metadatainstead of a parallel model, so views get materialization, tenancy, and permissions for free.explore_datatool picks from a catalog of real fields, the backend compile-validates the result, and one shared module assembles hand-built and AI-built columns alike.Human review required — do not self-merge.