Skip to content

daxis-io/axon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

142 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Axon

Axon is a Rust workspace for a hybrid query engine over Delta Lake tables. The same QueryRequest runs inside a browser tab against cloud hosted Parquet files, or falls back to a native DataFusion runtime when the query needs more than the browser slice supports.

Status: early. The browser path is narrow today. It ships a deterministic planner and a small executor over a curated SQL subset. The native path is the correctness oracle. See Scope and status.

Why it exists

Most analytics stacks round-trip every query through a query service. The browser asks the service, the service reads object storage, the service returns rows. Every dashboard needs that service running, and interactive exploration is bottlenecked by it.

Axon takes a different approach:

  • If a query is safe to run in a browser tab, fetch only the Parquet byte ranges it needs over signed URLs and run it there.
  • Otherwise, route the same QueryRequest to the native DataFusion runtime.
  • Share one query contract, one Delta snapshot resolver, and one fallback taxonomy across both tiers.

How it works

            ┌──────────────────────────────┐
            │       Caller (browser)       │
            │   QueryRequest → axon_table  │
            └──────────────┬───────────────┘
                           │
                           ▼
            ┌──────────────────────────────┐
            │   query-router: browser?     │
            │   (capabilities + policy)    │
            └────────┬───────────┬─────────┘
            yes      │           │     no / fallback
                     ▼           ▼
   ┌──────────────────────┐   ┌──────────────────────┐
   │  Browser runtime     │   │  Native runtime      │
   │  (WASM)              │   │  (DataFusion +       │
   │   HTTP range reads   │   │   delta-rs)          │
   │   Parquet planning   │   │   Full SQL           │
   │   Narrow executor    │   │   Oracle for tests   │
   └──────────┬───────────┘   └──────────┬───────────┘
              │                          │
              └────────────┬─────────────┘
                           ▼
            ┌──────────────────────────────┐
            │   delta-control-plane        │
            │   (snapshot resolution +     │
            │    table policy)             │
            └──────────────────────────────┘

Three rules keep the tiers consistent:

  1. One contract. query-contract defines the request, response, capability flags, and structured fallback reasons that both runtimes return.
  2. Native is the oracle. Every browser SQL case has a native counterpart in the test corpus. Results must match or the browser path fails closed to the native runtime.
  3. No silent capability drift. Anything the browser cannot do (unsupported aggregate, partition without a known type, multi-partition execution, missing footer stats, identity drift between bootstrap and read) returns a structured fallback instead of a wrong answer.

Repo tour

The Rust workspace lives in crates/, grouped by role.

Shared contract

  • query-contract. Request and response types, capability flags, fallback reasons.
  • query-router. Decides browser vs. native and produces structured fallback decisions.

Native tier

Browser tier (compiles to wasm32-unknown-unknown)

  • wasm-http-object-store. Validated HTTP and browser-local byte range reads with memory and OPFS extent cache adapters. Redacts URL secrets in errors.
  • wasm-parquet-engine. Browser side Parquet planning and async footer plus scan primitives.
  • wasm-delta-snapshot. Browser safe Delta snapshot reconstruction (log replay plus checkpoints).
  • wasm-query-runtime. Constrained browser runtime envelope. Bootstraps snapshots, plans, prunes, and runs the supported SQL subset.
  • wasm-query-session. Legacy narrow in-memory session shell. Caches materialized and bootstrapped snapshots across queries with a memory budget while staying isolated for removal.
  • wasm-datafusion-session. Dedicated DataFusion-backed browser session for UI/runtime builds. Owns DataFusion table registration, SQL scope checks, budgets, metrics, and Arrow IPC while keeping the legacy narrow session out of the production UI DataFusion path.
  • browser-sdk. Embedding surface. Worker request envelopes, Arrow IPC results, fallback propagation.
  • browser-engine-worker. Linked worker artifact used to measure WASM size, cold start, and memory footprint.
  • apps/axon-web. Production browser runtime: SQL editor, catalog connect workflow, and the WASM crate (axon-web-wasm) that vends the in-browser DataFusion session.

Trusted control plane

  • delta-control-plane. Snapshot resolution and table policy enforcement. Mints the descriptor seam that a (not yet shipped) signing service will fill in with per file URLs.

Scaffolds (not yet wired up)

Scope and status

What works in repo today

  • Native SQL over Delta tables, with snapshot pinning, partition pruning, and execution derived metrics.
  • A browser runtime that bootstraps a snapshot, plans a candidate file set, prunes partitions and integer footer stats, and executes a curated SQL subset (filter, project, group, the common aggregates, output aligned ORDER BY / LIMIT).
  • Delta snapshot reconstruction is already repo-owned in crates/wasm-delta-snapshot; the shipped worker remains narrow runtime + streaming scan + in-memory session shell, while the browser sandbox UI production runtime uses the dedicated wasm-datafusion-session path.
  • The browser-facing TypeScript SDK has a manifest-based bundle selector. The current baseline is single-threaded; SIMD and threaded bundle tiers are represented for future deployments but are not assumed.
  • wasm-http-object-store has a first OPFS persistent extent cache adapter for indexed validated extents, bounded per object identity, plus a fail-open cache contract so persistence errors become cache misses.
  • A query router that returns structured fallback decisions instead of guessing.
  • CI gates for wasm32 build, host tests, WASM smoke tests, a real browser-engine-worker.wasm size budget, and dependency guardrails that prevent cloud SDKs from leaking into browser bundles.

What is not in this repo yet

  • A services/query-api HTTP service. signed URL issuance, proxy-mode request issuance, audit logging, request correlation, and CORS/origin validation are external blockers.
  • OPFS / IndexedDB session-level persistent caches. The OPFS adapter exists lower in the object-store stack, but wasm-query-session remains in-memory only.
  • A shipped worker artifact with broad browser DataFusion. DataFusion is available to the browser sandbox UI through wasm-datafusion-session; the default worker intentionally reports browser_datafusion = false.

The full launch checklist lives in docs/release-gates/browser-wasm-delta-gcs-launch-checklist.md. External dependencies are tracked in docs/release-gates/browser-wasm-delta-gcs-external-blockers.md.

Quick start

# Build the workspace.
cargo check --workspace

# Run the host side tests for the contract and the two runtimes.
cargo test -p query-contract
cargo test -p native-query-runtime
cargo test -p wasm-query-runtime

# Confirm the browser crates still compile to wasm.
cargo check \
  -p wasm-query-runtime -p wasm-http-object-store \
  -p wasm-parquet-engine -p wasm-delta-snapshot \
  -p browser-sdk -p browser-engine-worker \
  --target wasm32-unknown-unknown

For the WASM smoke suites and the worker artifact gates, see Development.

Going deeper

Development

WASM smoke suites

cargo install wasm-bindgen-cli --version 0.2.114 --locked

cargo test -p browser-sdk            --target wasm32-unknown-unknown --locked --test wasm_smoke
cargo test -p wasm-parquet-engine    --target wasm32-unknown-unknown --locked --test wasm_smoke
cargo test -p wasm-delta-snapshot    --target wasm32-unknown-unknown --locked --test wasm_smoke
cargo test -p wasm-query-runtime     --target wasm32-unknown-unknown --locked --test wasm_smoke
cargo test -p wasm-http-object-store --target wasm32-unknown-unknown --locked --test wasm_smoke
cargo test -p browser-engine-worker  --target wasm32-unknown-unknown --locked --test wasm_smoke -- --nocapture

Worker artifact and security gates

cargo test -p browser-engine-worker --locked
bash tests/perf/report_browser_worker_artifact.sh
bash tests/security/verify_browser_dependency_guardrails.sh

See tests/perf/README.md and tests/security/README.md for the size, startup, footprint, and dependency guardrail gates.

Optional GCS smokes

The native runtime ships env gated smokes against real Google Cloud Storage Delta tables. They assume Application Default Credentials are already available and the configured tables match a narrow fixture contract (partition column, expected snapshot versions, and so on). See crates/native-query-runtime for the full set. The most common starting points:

AXON_GCS_TEST_TABLE_URI=gs://your-bucket/your-table \
  cargo test -p native-query-runtime --locked \
  bootstrap_table_supports_env_gated_gcs_smoke -- --exact --nocapture

AXON_GCS_TEST_TABLE_URI=gs://your-bucket/your-table \
  cargo test -p native-query-runtime --locked \
  execute_query_supports_env_gated_gcs_smoke -- --exact --nocapture

CI runs the same commands behind google-github-actions/auth when AXON_GCP_CREDENTIALS_JSON is configured. Negative smokes (forbidden, not found, stale history, missing object) and the partitioned pruning and snapshot version smokes use additional AXON_GCS_TEST_* variables documented alongside their tests.

Repository layout

  • crates/. Rust workspace packages. See Repo tour.
  • tests/conformance/. Native SQL corpora that double as the oracle for the browser planner and executor.
  • tests/perf/. Performance budgets, the browser-engine-worker.wasm size gate, and benchmark scaffolding.
  • tests/security/. Security reporting guidance, browser dependency and bundle guardrails.
  • docs/. Program, ADR, epic, plan, and release gate documentation.
  • .github/workflows/ci.yml. CI configuration.

About

A WASM native query engine for Delta Lake

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors