Skip to content

Fix filter injection: extend source instead of refining run output#723

Open
jswir wants to merge 3 commits into
malloydata:mainfrom
jswir:jswir/filter-source-extend
Open

Fix filter injection: extend source instead of refining run output#723
jswir wants to merge 3 commits into
malloydata:mainfrom
jswir:jswir/filter-source-extend

Conversation

@jswir
Copy link
Copy Markdown
Collaborator

@jswir jswir commented May 9, 2026

Summary

The server-driven #(filter) injection added in #673 appends + {where: ...} to the end of the query string. That breaks for inline multi-stage pipelines that drop the filtered dimension — the trailing refinement attaches to the last stage's output schema, so when the filtered field has been projected away, compilation fails with '<field>' is not defined.

This change splices extend {where: <clause>} immediately after the source token (run: orders extend {where: ...} -> ...), pushing the predicate to source level so it survives any pipeline shape and any limit position.

When no source name is available (e.g. run: someNamedQuery), it falls back to the previous tail + {where: ...} behavior. The fallback only succeeds when the filter dimension is present in the named query's output schema — the Malloy compiler does not auto-hoist where down to the source. Cases where the filter dimension is dropped before the named query's output remain broken on the fallback path; the getQueryResults(undefined, "queryName", ...) route is the one place this matters in practice and is called out as out of scope.

Empirical findings

I ran the failure shapes through malloy-cli against the auto_recalls fixture to scope the bug.

Query shape Old (tail + {where:}) New (source extend {where:})
run: src -> view + {limit:N}
run: src -> { ...; limit:N }
run: src -> namedView (where dim is in view's output schema)
run: src -> namedView (where dim is dropped from view's output) '<field>' is not defined
run: src -> { ... } -> { select: x } where filter dim is dropped after stage 1 ❌ same
run: src -> { ...; calculate: rk is rank() } -> { select: rk } filter dim dropped ❌ same

The failing common case is filtering on a dimension that doesn't survive into the last stage's output schema — whether that's an inline -> pipeline or a named view. The new source-extend splice fixes all of these for the run: src -> ... shape; the run: queryName (no source) shape still has the same gap as before.

Changes

  • packages/server/src/service/filter.tsinjectFilterRefinement now takes an optional sourceName and splices S extend {where: <clause>} after the source token (regex \\b<S>\\b(?=\\s*->)). Falls back to the original tail behavior when no source name is supplied or the source token is not found.
  • packages/server/src/service/model.ts — both call sites (getQueryResults, executeNotebookCell) pass the already-computed effectiveSource through.
  • packages/server/src/service/filter.spec.ts — rewrites the existing injectFilterRefinement unit tests + adds cases for inline pipelines, leading whitespace, and source-name-not-found fallback.
  • packages/server/src/service/filter_compile.spec.ts (new) — end-to-end matrix that loads a real Model, runs each query shape against a duckdb fixture, and asserts the rows come back filtered. The two pipeline-drop cases hard-fail without the source-extend splice; with the fix all 7 cases pass.

Test plan

  • bun test packages/server/src/service/filter.spec.ts — 47 pass
  • bun test packages/server/src/service/filter_integration.spec.ts — 28 pass (no behavioral changes for named-view paths)
  • bun test packages/server/src/service/filter_compile.spec.ts — 7 pass; verified the two pipeline-drop cases fail with the splice disabled ('status' is not defined) and pass with it enabled
  • Validated against malloy-samples/auto_recalls via malloy-cli: source-extend syntax compiles and applies the filter; tail-refinement form fails on the pipeline-drop cases with 'Major Recall' is not defined
  • Manual: spun up bun run start:dev, opened a notebook against an auto_recalls model with #(filter) annotations, executed each query shape from the matrix in the SDK notebook UI; filter widgets at the top correctly drive every cell.

Out of scope

  • The getQueryResults(undefined, "queryName", undefined, ...) path (top-level named query, no source in the run target) still doesn't apply filters because extractSourceName can't find a source there — pre-existing gap, unchanged here.
  • Removing the unused legacy injectWhereClause exported from packages/sdk/src/components/filter/utils.ts (no internal callers since Add server-driven #(source_filter) annotation system #673).

jswir added 2 commits May 9, 2026 15:03
The server-driven #(filter) injection from malloydata#673 appended `+ {where: ...}`
to the end of the query string. Two failure modes:

  1. Inline multi-stage pipelines that drop the filter dimension after
     the first stage compile-fail with "field is not defined", because
     the trailing refinement attaches to the last stage's output schema
     rather than the source.
  2. Refinements after a top-level `limit` clause are invalid syntax.

Both are sidestepped by splicing `extend {where: <clause>}` immediately
after the source token (e.g. `run: orders extend {where: ...} -> ...`),
which pushes the predicate to source level. Falls back to the previous
tail-refinement when no source name is available — that path still
works for named queries because the compiler hoists the where down.

Adds filter_compile.spec.ts: an end-to-end matrix that compiles and
executes each query shape against duckdb (named view, internal limit:,
+{limit:N}, inline 2-stage pipeline that drops the filter field, named
pipeline view, calculate: window functions). The two pipeline-drop
cases hard-fail without the source-extend splice; everything else
already worked and continues to.
Drop the |$ branch from the lookahead. It was unreachable in practice:
matching `<source>` at end-of-string would produce `run: <source> extend
{where:...}` which isn't a valid Malloy run target anyway. Lookahead now
plainly requires `->`, matching the actual contract of the splice path.
@jswir jswir requested review from mhwilder and quirkyllama May 9, 2026 21:56
I, James Swirhun <james@credibledata.com>, hereby add my Signed-off-by to this commit: c9cf7d0
I, James Swirhun <james@credibledata.com>, hereby add my Signed-off-by to this commit: f4ff5f7

Signed-off-by: James Swirhun <james@credibledata.com>
@adamribaudo-ff
Copy link
Copy Markdown
Contributor

adamribaudo-ff commented May 10, 2026

Consolidating some feedback from our Slack discussion. My comments are mainly related to the source filters feature as a whole, not just this PR:

  • A query that omits a require source filter will pass the /compile API check. Ideally, the compile check is the only check necessary to understand if a query will successfully execute.
  • Required source filters are not enforced for joined sources.
  • The branching implementations of "refinement" for certain queries and "source extension" for others makes this seem a bit brittle. Same concern about parsing query text with regex and handling pre-defined queries as a special case.

IMO, all 3 of these would be best addressed upstream in the language itself. If source filters were a primitive of Malloy, then the compile check and handling of joins and concerns about the output space could be solved.

As it stands, I don't think source filters could reliably be used for row level access control (which I believe was an intended use case, though I could be wrong).

I've also been following the progress on malloy 'givens' and was thinking there seems to be overlap between source filters and the capabilities of 'givens'. If you could make a 'given' required, do you end up with required source filters? https://docs.malloydata.dev/documentation/experiments/givens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants