Fix filter injection: extend source instead of refining run output#723
Fix filter injection: extend source instead of refining run output#723jswir wants to merge 3 commits into
Conversation
The server-driven #(filter) injection from malloydata#673 appended `+ {where: ...}` to the end of the query string. Two failure modes: 1. Inline multi-stage pipelines that drop the filter dimension after the first stage compile-fail with "field is not defined", because the trailing refinement attaches to the last stage's output schema rather than the source. 2. Refinements after a top-level `limit` clause are invalid syntax. Both are sidestepped by splicing `extend {where: <clause>}` immediately after the source token (e.g. `run: orders extend {where: ...} -> ...`), which pushes the predicate to source level. Falls back to the previous tail-refinement when no source name is available — that path still works for named queries because the compiler hoists the where down. Adds filter_compile.spec.ts: an end-to-end matrix that compiles and executes each query shape against duckdb (named view, internal limit:, +{limit:N}, inline 2-stage pipeline that drops the filter field, named pipeline view, calculate: window functions). The two pipeline-drop cases hard-fail without the source-extend splice; everything else already worked and continues to.
Drop the |$ branch from the lookahead. It was unreachable in practice:
matching `<source>` at end-of-string would produce `run: <source> extend
{where:...}` which isn't a valid Malloy run target anyway. Lookahead now
plainly requires `->`, matching the actual contract of the splice path.
|
Consolidating some feedback from our Slack discussion. My comments are mainly related to the source filters feature as a whole, not just this PR:
IMO, all 3 of these would be best addressed upstream in the language itself. If source filters were a primitive of Malloy, then the compile check and handling of joins and concerns about the output space could be solved. As it stands, I don't think source filters could reliably be used for row level access control (which I believe was an intended use case, though I could be wrong). I've also been following the progress on malloy 'givens' and was thinking there seems to be overlap between source filters and the capabilities of 'givens'. If you could make a 'given' required, do you end up with required source filters? https://docs.malloydata.dev/documentation/experiments/givens |
Summary
The server-driven
#(filter)injection added in #673 appends+ {where: ...}to the end of the query string. That breaks for inline multi-stage pipelines that drop the filtered dimension — the trailing refinement attaches to the last stage's output schema, so when the filtered field has been projected away, compilation fails with'<field>' is not defined.This change splices
extend {where: <clause>}immediately after the source token (run: orders extend {where: ...} -> ...), pushing the predicate to source level so it survives any pipeline shape and any limit position.When no source name is available (e.g.
run: someNamedQuery), it falls back to the previous tail+ {where: ...}behavior. The fallback only succeeds when the filter dimension is present in the named query's output schema — the Malloy compiler does not auto-hoistwheredown to the source. Cases where the filter dimension is dropped before the named query's output remain broken on the fallback path; thegetQueryResults(undefined, "queryName", ...)route is the one place this matters in practice and is called out as out of scope.Empirical findings
I ran the failure shapes through
malloy-cliagainst the auto_recalls fixture to scope the bug.+ {where:})extend {where:})run: src -> view + {limit:N}run: src -> { ...; limit:N }run: src -> namedView(where dim is in view's output schema)run: src -> namedView(where dim is dropped from view's output)'<field>' is not definedrun: src -> { ... } -> { select: x }where filter dim is dropped after stage 1run: src -> { ...; calculate: rk is rank() } -> { select: rk }filter dim droppedThe failing common case is filtering on a dimension that doesn't survive into the last stage's output schema — whether that's an inline
->pipeline or a named view. The new source-extend splice fixes all of these for therun: src -> ...shape; therun: queryName(no source) shape still has the same gap as before.Changes
packages/server/src/service/filter.ts—injectFilterRefinementnow takes an optionalsourceNameand splicesS extend {where: <clause>}after the source token (regex\\b<S>\\b(?=\\s*->)). Falls back to the original tail behavior when no source name is supplied or the source token is not found.packages/server/src/service/model.ts— both call sites (getQueryResults,executeNotebookCell) pass the already-computedeffectiveSourcethrough.packages/server/src/service/filter.spec.ts— rewrites the existinginjectFilterRefinementunit tests + adds cases for inline pipelines, leading whitespace, and source-name-not-found fallback.packages/server/src/service/filter_compile.spec.ts(new) — end-to-end matrix that loads a realModel, runs each query shape against a duckdb fixture, and asserts the rows come back filtered. The two pipeline-drop cases hard-fail without the source-extend splice; with the fix all 7 cases pass.Test plan
bun test packages/server/src/service/filter.spec.ts— 47 passbun test packages/server/src/service/filter_integration.spec.ts— 28 pass (no behavioral changes for named-view paths)bun test packages/server/src/service/filter_compile.spec.ts— 7 pass; verified the two pipeline-drop cases fail with the splice disabled ('status' is not defined) and pass with it enabledmalloy-samples/auto_recallsviamalloy-cli: source-extend syntax compiles and applies the filter; tail-refinement form fails on the pipeline-drop cases with'Major Recall' is not definedbun run start:dev, opened a notebook against anauto_recallsmodel with#(filter)annotations, executed each query shape from the matrix in the SDK notebook UI; filter widgets at the top correctly drive every cell.Out of scope
getQueryResults(undefined, "queryName", undefined, ...)path (top-level named query, no source in the run target) still doesn't apply filters becauseextractSourceNamecan't find a source there — pre-existing gap, unchanged here.injectWhereClauseexported frompackages/sdk/src/components/filter/utils.ts(no internal callers since Add server-driven #(source_filter) annotation system #673).