Skip to content

Add powermglot: Power Query M parser + transpiler; integrate into pbi2dbr#1

Merged
brookpatten merged 5 commits into
mainfrom
copilot/add-powermglot-library
Apr 1, 2026
Merged

Add powermglot: Power Query M parser + transpiler; integrate into pbi2dbr#1
brookpatten merged 5 commits into
mainfrom
copilot/add-powermglot-library

Conversation

Copilot AI commented Mar 30, 2026

Copy link
Copy Markdown
Contributor

Implements a proper Power Query M parser (powermglot) as a new workspace library and replaces pbi2dbr's regex-based M expression handling with it.

powermglot — new workspace library

A recursive-descent M parser and sqlglot-backed SQL transpiler for let…in chain expressions.

Architecture:

  • lexer.py — full M tokeniser: keywords, #"quoted identifiers", "" escaped strings, all operators
  • ast_nodes.py — dataclass AST (LetExpr, CallExpr, NavExpr, EachExpr, BinaryOpExpr, …)
  • parser.py — recursive-descent parser: nested let, each, if/then/else, dotted function names, {[Name=…]}[Data] navigation chains
  • transpiler.py — walks the let binding graph (following variable references) and emits a sqlglot SELECT

Supported M → SQL mappings:

M pattern SQL
Connector + {[Name=…]}[Data] chain (2–3 levels) FROM catalog.schema.table
Table.SelectRows(t, each predicate) WHERE (and/or/not)
Table.SelectColumns / RenameColumns / AddColumn / RemoveColumns Column projection + aliases
Table.Group GROUP BY + aggregations
Table.NestedJoin / Table.Join JOIN
Value.NativeQuery(src, "sql") Subquery passthrough
from powermglot import m_to_sql, parse_m_source

sql = m_to_sql("""
    let
        Source = Databricks.Catalogs("host", "443", [Catalog="prod"]),
        db = Source{[Name="pbi"]}[Data],
        orders = db{[Name="orders"]}[Data],
        filtered = Table.SelectRows(orders, each [status] = "Active")
    in filtered
""", dialect="spark")
# SELECT * FROM prod.pbi.orders WHERE status = 'Active'

parse_m_source — extraction API for embedders

A safe, never-raising entry point that returns a MSourceInfo(source_ref, native_sql, filter_sql) without generating SQL — designed for use by data-source extractors.

from powermglot import parse_m_source

info = parse_m_source(m_expr)
# info.source_ref  → "prod.pbi.orders"
# info.filter_sql  → "status = 'Active'"
# info.native_sql  → None

pbi2dbr integration

pbi2dbr now depends on powermglot (workspace dependency). The three M-resolution functions in extractor.py use parse_m_source as primary with the existing regex logic as fallback for patterns powermglot cannot yet parse (e.g. {[Item="x"][Kind="Table"]} navigation style):

  • PbixExtractor._resolve_uc_ref() — powermglot resolves the table ref; applies default catalog/schema to fill in missing parts
  • _extract_filter_expr() — powermglot extracts Table.SelectRows predicates via the AST; handles nested let chains and variable references the regex cannot
  • _extract_native_query_sql() — powermglot extracts Value.NativeQuery SQL directly from the parsed AST

⚡ Quickly spin up Copilot coding agent tasks from anywhere on your macOS or Windows machine with Raycast.

Copilot AI and others added 2 commits March 30, 2026 04:33
Copilot AI changed the title [WIP] Add powermglot library for powerM parser Add powermglot: Power Query M parser and SQL transpiler Mar 30, 2026
Copilot AI requested a review from brookpatten March 30, 2026 04:44
Copilot AI changed the title Add powermglot: Power Query M parser and SQL transpiler Add powermglot: Power Query M parser + transpiler; integrate into pbi2dbr Mar 31, 2026
@brookpatten brookpatten marked this pull request as ready for review April 1, 2026 01:38
@brookpatten brookpatten merged commit 24639de into main Apr 1, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants