Implement UNION / UNION ALL by mlell · Pull Request #281 · beancount/beanquery

mlell · 2026-05-13T21:57:35Z

This PR allows to join SELECT queries by UNION ALL (concatenate) or UNION (concat and dedup).

This is the first part of reworking #265 to first provide a union based on which GROUP BY GROUPING SETS / GROUP BY ROLLUP, etc. can be implemented, like suggested in the PR.

There are two commits that only refactor without changing functionalities to prepare for the third commit that adds the UNION clause. The refactoring is mainly to introduce "Query" as a new top-level AST entity which either wraps a single SELECT or a UNION of multiple SELECTs.

EvalQuery is renamed to EvalSelect for clearer terminology as it relates to ast.Select. A new EvalQuery node is introduced which takes the responsibility for ORDER BY, LIMIT, and PIVOT BY from EvalSelect (the old EvalQuery). The reason is that the last ORDER BY, LIMIT, etc. of a UNION apply to the end result of the union. To apply those operators to a single SELECT operand inside UNION, use subqueries like (SELECT ... ORDER BY ... ) UNION .... `

The function pre-filtered the input string accepting only day, month, or year, while the function accepts more inputs. Change the regex to split only number from word.

Previously the grammar rule `select` owned ORDER BY, LIMIT and PIVOT BY directly, and the parser returned a bare `ast.Select`. This conflated the data extraction (defining columns, a source, filters and grouping) with result-set modifiers (ORDER BY, LIMIT and PIVOT BY) that act on whatever comes out of that expression. Splitting this is a preparation for a future UNION chain. The new `ast.Query` node separates these concerns. The grammar rule `query::Query` wraps one (as of now) SELECT body and claims ORDER BY, LIMIT, and PIVOT BY for itself; `ast.Select` is now a pure table expression with no sorting or paging fields. `parse()` always returns `ast.Query`, even for the simplest `SELECT *`. **ast.py**: `Select` loses `order_by`, `limit`, and `pivot_by`; new `Query` node carries those fields and wraps a list of `Select` nodes. **bql.ebnf**: `select` rule no longer contains ORDER BY / LIMIT / PIVOT BY; a new `query::Query` rule wraps `select` and owns those clauses. Rename `subselect` to `subquery`, reflecting the change of top level `select` -> `query`. This delegates to `query` so parenthesised sub-queries may carry their own result-set modifiers. Updated `any` and `all` rules to avoid double parentheses when used with subselects. The `expression` rule requires `subquery` (instead of formerly `select`) to avoid ambiguities like `SELECT SELECT x FROM y WHERE z`. **query_compile.py**: Rename `EvalQuery` to `EvalSelect`. The dataclass holds the compiled SELECT body (table, targets, where, group_indexes, having_index, distinct). A new `EvalQuery` now wraps `EvalSelect` and owns `order_spec` and `limit`. `EvalQuery` properties `columns` and `c_targets` are retained, these are forwarded from the nested SELECT. In the future, this will only be possible for single-SELECT queries (not e.g., UNION chains). **compiler.py**: New `_query` dispatch handler is extracted from `_select`. `_select` compiles the inner SELECT body until GROUP BY. `_query` then compiles ORDER BY, performs the aggregate coverage check, and finally compiles LIMIT and PIVOT BY. In the function `_compile_from`, the subquery detection is updated from `ast.Select` to `ast.Query`. A new check rejects `SELECT DISTINCT ... ORDER BY <col>` when `<col>` is not in the SELECT list, since this would produce non-deterministic results. This avoids handling DISTINCT on Query level. **query_execute.py**: New `execute_query()` wraps `execute_select()`, ensuing in changes in control flow: Before: execute_select(query) ├── Compute result_types (visible columns only) ├── Compute result_indexes (visible column indices) ├── Execute query (non-aggregated or aggregated path) ├── ORDER BY (on full rows) ├── Extract visible columns into result tuples ├── DISTINCT (on extracted rows) ├── LIMIT └── Return (result_types, rows) After: execute_query(query) ← New entry point ├── query.select() ← Delegates to EvalQuery.select() │ └── execute_select(query) ← Returns ALL columns + visibility mask │ ├── Compute result_types (ALL columns) │ ├── Compute visible_mask │ ├── Execute query (non-aggregated or aggregated) │ ├── DISTINCT (on visible columns, but keeps full rows) │ └── Return (result_types, rows, visible_mask) │ ├── ORDER BY (on full rows) ├── Extract visible columns ├── LIMIT └── Return (result_types, rows) **transform_journal / transform_balances**: These template-based desugaring functions now return `ast.Query` wrapping the constructed `ast.Select`, so ORDER BY from the BALANCES template reaches the `_query` handler through the normal path. **Tests**: Updated to expect `ast.Query` from parser, access `query.select` for inner fields, and construct `EvalQuery(select=EvalSelect(...), ...)`.

Type coercion for numeric operands will be needed in multiple contexts: Binary operators (existing) and UNION type compatibility checking (upcoming). Currently, the coercion logic is duplicated in _binaryop. Extracting it into a reusable helper enables both contexts to apply consistent type coercion rules, particularly the int→Decimal promotion that avoids information loss. Changes: - Add _try_coerce_operand(operand, target_type) helper method to Compiler. Returns coerced operand or None if coercion is not possible Encapsulates: type equality check, int→Decimal promotion, function lookup - Refactor _binaryop to use _try_coerce_operand - Add unit tests for _try_coerce_operand

BQL previously supported only single SELECT statements. This change introduces UNION and UNION ALL set operators so that multiple SELECT operands can be combined into one result set, with optional ORDER BY, LIMIT, and PIVOT BY applied to the combined output. Grammar (bql.ebnf): extend the query rule to accept a chain of SELECT operands separated by UNION or UNION ALL tokens; the resulting AST carries a parallel set_operators list (length = number of operands − 1). AST (ast.py): add the set_operators field to the Query node and document its semantics. Compiler (compiler.py): compile each operand independently against the original table context, validate that all operands have the same column count and compatible types, and auto-coerce int/Decimal mismatches to Decimal using the existing _try_coerce_operand helper. Make `_query()` a top-level dispatcher for two flows: * Simple query with only one SELECT: `_compile_single_select_query` * Query that joins multiple SELECTs using UNION: `_compile_union_query` Runtime (query_compile.py): introduce EvalUnion, a new dataclass that accumulates rows across sub-queries and applies deduplication on UNION boundaries while preserving insertion order. EvalUnion has the same interface as EvalSelect (returns result_types, rows, visible_mask), and is wrapped by EvalQuery which handles ORDER BY, LIMIT, and visible column extraction uniformly for both single SELECTs and UNIONs.

mlell added 4 commits May 11, 2026 15:44

Fix outdated regex in BQL interval() function

d7679a1

The function pre-filtered the input string accepting only day, month, or year, while the function accepts more inputs. Change the regex to split only number from word.

mlell mentioned this pull request May 13, 2026

GROUP BY CUBE / ROLLUP / GROUPING SETS #265

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement UNION / UNION ALL#281

Implement UNION / UNION ALL#281
mlell wants to merge 4 commits into
beancount:masterfrom
mlell:dev-union

mlell commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mlell commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant