Skip to content

feat: Add Polaris SQLshell prototype#229

Open
bbejeck wants to merge 1 commit into
apache:mainfrom
bbejeck:add-polaris-shell
Open

feat: Add Polaris SQLshell prototype#229
bbejeck wants to merge 1 commit into
apache:mainfrom
bbejeck:add-polaris-shell

Conversation

@bbejeck
Copy link
Copy Markdown
Member

@bbejeck bbejeck commented May 16, 2026

I realize this is unrealistically large at 3K lines. I started working on this and was looking at it from the perspective of the entire idea, not adding this over multiple PRs. So I'll ask to take a look at different parts, and if we agree to pursue this idea and direction, I'll break it into smaller PRs, targeting at most 1K lines per PR.

This PR adds polaris-shell, an interactive SQL shell for exploring Iceberg tables and catalog metadata through Polaris via its REST catalog API.

Motivation

Getting quick answers about your catalog — table counts, snapshot stats, storage location, small-file diagnostics — currently requires switching to Trino, Spark, or pyiceberg. Polaris Shell provides a lightweight SQL interface for these tasks without spinning up a heavy query engine.

How it works

Connects to Polaris using the Iceberg RESTCatalog with OAuth2 client credentials. SQL statements are parsed with an ANTLR 4 grammar, converted to a query plan, and executed directly through the Iceberg Java library — no JDBC driver, no query engine.

SQL input → ANTLR parser → QueryPlan → Iceberg REST catalog API → results

Supported commands

Command Purpose
SELECT Sample table data with predicate pushdown, column projection, ORDER BY, LIMIT
SHOW TABLES IN <namespace> List tables and count under a namespace
DESCRIBE STATS <table> Snapshot count, current snapshot ID, partition spec, schema
SHOW TABLE LOCATION <table> Storage location
SHOW TABLE POLICIES <table> Effective Polaris policies
DIAGNOSE TABLE <table> Small-file count vs. 128 MiB threshold
EXPLAIN SELECT ... Scan plan: manifest pruning, files eliminated, estimated bytes, warnings

SELECT queries are intended for sampling and exploration, not production workloads.

Demo

A fully self-contained demo runs locally via Docker Compose + MinIO — no AWS account or external Polaris server required.

See polaris-shell/README.md for full documentation, sample output, configuration reference, and demo instructions.

Copy link
Copy Markdown
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together. I think this is a useful proof of concept, and I like the general direction of having a lightweight Polaris CLI/shell for catalog exploration. Also, I’m fine with a big draft PR that contains the whole idea end-to-end, as long as we treat it as a design/prototype discussion and later split it into smaller, actually reviewable PRs.

I do have a few bigger design concerns though, mostly around where this would go if we turned it into a real user-facing CLI.

First, I wonder whether we should look more closely for example at the Nessie CLI before going too far down this path. It already has quite a bit of REPL/script execution/terminal/completion infrastructure, and the CongoCC-based grammar is pretty well suited for completion. I’m a bit hesitant about adding ANTLR here, partly because of the extra runtime jar and possible dependency conflicts, and partly because completion seems to be an important part of the UX for this kind of tool.

Second, I’m not sure generic SELECT support is the right starting point. It can very quickly turn this from a catalog/admin shell into a small query engine, with all the semantics and expectations that come with that. I’d feel more comfortable starting with catalog-oriented commands like listing namespaces/tables, describing schemas/properties/snapshots/locations, diagnostics, etc., and being very explicit that this is not a SQL execution engine.

Related to that, I’m also not sure about EXPLAIN SELECT. The current implementation seems more like an Iceberg/table scan diagnostic than a query-engine explain plan. Maybe that’s still useful, but I’d probably frame it as a table diagnostic command instead of tying it to SQL EXPLAIN.

The other big one for me is credentials. If this is intended to become a real CLI for users, I’m strongly against documenting a new plaintext properties file with any client or object storage secrets. For local demos that’s one thing, but for actual usage we should have a better config story from the beginning. SmallRye Config might be worth considering here: it gives us typed Java config mapping, environment variable support, and mechanisms for encrypted values / secrets managers. At minimum I’d want env-var support, clear guidance around file permissions, and examples that don’t encourage putting long-lived secrets in a checked-in or casually copied properties file.

So overall: I think this is a good PoC and useful for discussing the shape of the tool, but before merging something like this I’d like us to agree on the CLI scope and architecture first, especially around parser/completion, whether we want any data-querying at all, and credential handling.

@bbejeck
Copy link
Copy Markdown
Member Author

bbejeck commented May 23, 2026

Thanks @snazy for the detailed response! I'm taking a look now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants