feat(data_explorer): add no-code visual query builder (alpha) by albertogrande · Pull Request #9 · albertogrande/posthog

albertogrande · 2026-06-10T13:25:23Z

Data explorer is a no-code builder for tables that span PostHog and warehouse data — joins, rollups, and filters by clicking, saved as a queryable warehouse view. No SQL.

▶ Try it live: https://posthog-demo.fly.dev/ — demo@posthog-demo.fly.dev / PostHogDemo

▶ Demo video

1080-30.mp4

Problem

"Which accounts stopped paying but still use the product?" is a standard growth-review question. Answering it means joining group data, a warehouse invoices table, and usage events, per account. Today that's hand-written HogQL — out of reach for the people who actually run those reviews, who think in tables and filters, not SQL.

Knowing SQL doesn't make it safe either. Join two one-to-many sources in one query and the rows fan out: every aggregate comes back inflated, and nothing tells you it's wrong.

Changes

A new alpha product in products/data_explorer/, behind FEATURE_FLAGS.DATA_EXPLORER. You pick a primary source, add columns from it and from related sources, connect warehouse tables on a join key, add filters and per-column time windows, preview live, and save the result as a view.

There's no new model. A saved view is a plain DataWarehouseSavedQuery with the builder state stored in builder_metadata, so it inherits the warehouse's materialization, permissions, and tenancy, and is queryable anywhere HogQL works.

Walkthrough — tracking monthly churn

Built entirely by clicking — the demo video above is this exact flow.

1. Pick what each row is. Churn is an account-level question, so pick groups: one row per account.

2. Connect the data. Join invoices (many rows per account → rollups) and salesforce (one row per account → details) on key = account_key. Events are already a related source for groups.

3. Add columns. Detail columns join 1:1. Rollups carry a math chip (Sum, Count, Avg, …) and a time window.

Source	Field	Added as
Account (primary)	`name` (company)	detail
Salesforce (1:1)	`owner`, `icp_score`	details
Invoices (1:many)	`amount_usd`	Sum · this month → `mrr_this_month`
Invoices (1:many)	`amount_usd`	Sum · last month → `mrr_last_month`
Events (1:many)	`exceptions_captured_in_period`	Sum · this month → `exceptions_this_month`
Events (1:many)	`days_with_exceptions`	Sum · this month → `days_this_month`
Events (1:many)	`timestamp`	Latest · all time → `last_active`

4. Filter to the churn segment. Two rules, ANDed: mrr_this_month = 0 and mrr_last_month > 0 — paid last month, nothing this month. Sort by active days descending so still-active accounts float to the top.

5. Read the result. Every row churned on billing; the usage columns say why. Active days near zero: they disengaged, then left. Active days near 30 with a high exception count: still using it daily without paying — the win-back list billing data alone can't show you.

6. Save it. The result is a queryable DataWarehouseSavedQuery — build a cohort from it, chart it in an insight, materialize it. Reopening it lands you back in the builder.

…or just ask PostHog AI

The explore_data tool lets Max drive the same builder. Describe the table in plain English and it picks fields, aggregations, filters, and sort from your connected sources, then hands back the same editable config you'd build by hand — not SQL dropped into chat. For example:

"Add each account's CS owner and ICP score, plus the total exceptions they've captured, and sort by total exceptions — highest first."

Max adds owner and icp_score as details and a Sum rollup of exceptions_captured_in_period, sorted descending. Hand-built and AI-built columns go through the same column-assembly module, so you can keep editing where it left off. In the flag-gated builder mode, Max is restricted to this tool: analytical prompts land in the builder, not as SQL in chat.

v1 scope: the AI works over sources already connected in the rail. It doesn't author new joins or time windows yet — you set those up once by hand (steps 2–3), then iterate with the AI on top.

Under the hood

Compiler. Pure and ORM-free: builder_metadata in, HogQL SelectQuery out. Each related source compiles to its own subquery reduced to one row per join key — any() for details, the chosen aggregate for rollups — then LEFT JOINs back onto the entity. Reducing before joining is what prevents the fan-out: two rollups over two sources can't multiply each other's rows. Time windows compile to conditional aggregation (sumIf/countIf), so "MRR this month" and "MRR last month" share one subquery.

Preview and persistence. POST /api/environments/:id/data_explorer_preview/ compiles and executes the config for the capped live grid; with compile_only it returns the uncapped HogQL that's persisted on the view. Connected joins live in the view's own builder_metadata, validated and size-bounded on write — building never touches the global warehouse schema.

PostHog AI. The tool (products/data_explorer/backend/max_tools.py) receives the current builder state plus a catalog of pickable fields, each with a ready-made column config. The model selects by reference; the backend resolves the selection into builder_metadata and compile-validates it before the frontend applies it. The model never writes HogQL — that's what keeps every AI-built query valid and editable.

Security. All user and AI input is lowered to typed AST constants and identifiers, resolved against the team's own schema, and escaped by the HogQL printer. No string interpolation into SQL anywhere in the path.

How did you test this code?

Manually. Built the walkthrough above end-to-end on the live build (posthog-demo.fly.dev), sanity-checked the numbers, saved the view, and reopened it in the builder. An agent browser QA pass over the same build found no P0/P1 defects.

The claims a reviewer will care about, and how each is verified:

No fan-out. Compiler tests assert each one-to-many source reduces to one row per key before joining (including a live ClickHouse execution); confirmed on the live build's compiled HogQL, where the two MRR windows share one sumIf subquery.
Injection neutralized. id) OR 1=1 -- resolves as a schema identifier and returns a 400, never executed — covered in the preview API tests and reproduced against the live endpoint. The AI tool likewise refuses injection and declines nonexistent fields.
Tenancy and gating. Preview API tests cover cross-tenant isolation, feature flag gating, row caps, and malformed-config rejection.

In CI: backend tests for the compiler, validation, preview API, and AI tool; Jest for the builder logics and grid; a Playwright end-to-end click-through.

Publish to changelog?

No — alpha, behind a flag.

Docs update

None in this PR (alpha). Add the skip-inkeep-docs label.

🤖 Agent context

Built with Claude Code; I drove the design and own the code, the compiler most of all. Decisions along the way:

No new model. Builder state rides on DataWarehouseSavedQuery.builder_metadata instead of a parallel model, so views get materialization, tenancy, and permissions for free.
Pre-aggregate, then join. Rejected naive multi-joins early: two one-to-many joins in one query double-count. Each related source reduces to one row per key before joining.
AI selects, never writes SQL. The explore_data tool picks from a catalog of real fields, the backend compile-validates the result, and one shared module assembles hand-built and AI-built columns alike.

Human review required — do not self-merge.

albertogrande · 2026-06-10T13:25:34Z

Note for reviewers: the red checks are fork-infra, not test failures — the paths-filter gate jobs need PostHog-internal GitHub App secrets that forks don't have, so their dependent "… Tests Pass" checks fail without running any tests. The backend, Jest, Playwright, tach, and OpenAPI suites all pass locally.

A column-less dataset compiled a placeholder projection from the supplied primary_key, which AI-built metadata defaults to "id" for every source kind — groups is keyed by `key`, events by `uuid` — so switching sources via the explore_data tool surfaced a "Preview failed" toast over an empty canvas. - compiler: project the kind's canonical row identifier (KIND_PRIMARY_KEYS) instead of trusting the supplied key - max_tools: stamp the canonical key on source change or blank state - scratchViewLogic: skip the preview request entirely when there are no columns (the grid never renders) and dismiss the stale error toast Matches the fixes already verified on the live demo build. https://claude.ai/code/session_01SPyW9ZdmmZ62DEQSGP5sDN

github-actions · 2026-06-18T09:00:24Z

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, please remove the stale label – otherwise this will be closed in another week. If you want to permanently keep it open, use the waiting label.

albertogrande added 11 commits June 10, 2026 15:01

feat(data_warehouse): store builder metadata on saved queries

cfef8f3

feat(data_explorer): scaffold product, scopes, and registration

7f66ac9

feat(data_explorer): cardinality-aware HogQL compiler

a3f8476

feat(data_explorer): live preview API endpoint

215d9e9

feat(lemon-ui): add LemonSpreadsheet results grid

b186046

feat(data_explorer): builder scene and kea logics

87500c5

feat(data_explorer): column picker and Connect data flow

0d13ad9

feat(data_explorer): PostHog AI explore tool and context panel

2a4e598

chore(data_explorer): generate API, MCP, and assistant schema types

afd830c

test(data_explorer): Playwright end-to-end coverage

d94ddf1

chore(data_explorer): seed growth & churn demo

3c52289

albertogrande mentioned this pull request Jun 10, 2026

RFC: Make PostHog the context layer for AI agents #8

Open

github-actions Bot added the stale label Jun 18, 2026

albertogrande added enhancement New feature or request and removed stale labels Jun 22, 2026

albertogrande mentioned this pull request Jun 23, 2026

[Launch plan] Data explorer (closed beta) #11

Open

15 tasks

albertogrande added the waiting Exempt from stale bot; keep open label Jun 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(data_explorer): add no-code visual query builder (alpha)#9

feat(data_explorer): add no-code visual query builder (alpha)#9
albertogrande wants to merge 12 commits into
dev/cleanfrom
pr/data-explorer

albertogrande commented Jun 10, 2026 •

edited

Loading

Uh oh!

albertogrande commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

albertogrande commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

Walkthrough — tracking monthly churn

…or just ask PostHog AI

Under the hood

How did you test this code?

Publish to changelog?

Docs update

🤖 Agent context

Uh oh!

albertogrande commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

albertogrande commented Jun 10, 2026 •

edited

Loading