Skip to content

daxis-io/arco

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

325 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Arco

File-native lakehouse catalog and orchestration. A catalog and metastore for open table formats, with Delta Lake as the first-class managed format.

License Rust Status

Arco is incubating and not yet production ready. APIs may change. Early feedback welcome.

What is Arco?

Arco stores catalog and orchestration metadata as Parquet files on object storage no always on database, no proprietary catalog service. Query your metadata with SQL the same way you query your data.

At the catalog layer, Arco manages table identity, locations, schemas, lineage, and operational metadata for open lakehouse table formats. New Arco table registrations default to Delta Lake; Iceberg and plain Parquet are explicit catalog surfaces with compatibility and governance support growing over time.

Why Arco?

  • No catalog server to operate - metadata lives in object storage; engines read it directly via signed URLs.
  • Query metadata with SQL - catalog, lineage, and run history are exposed as system.* tables.
  • Real lineage - captured from actual runs, not guessed from SQL parsing.
  • Multi-tenant by design - isolation is enforced at storage layout, service boundaries, and test gates.
  • Open table formats - Delta Lake is the primary managed format; Iceberg and plain Parquet are modeled in catalog contracts.

Get Started

Prerequisites

  • Rust 1.85+ (Edition 2024)
  • Protocol Buffers compiler (protoc)

Build and test

git clone https://github.com/daxis-io/arco.git
cd arco

cargo build --workspace
cargo test --workspace

Browse the docs

cargo install mdbook --version 0.4.52 --locked
cd docs/guide && mdbook build --open

Or jump straight to:

How it fits together

arco-api        HTTP/gRPC entry point (read only SQL via DataFusion)
arco-catalog    Table format catalog, lineage, Parquet metadata snapshots
arco-flow       Planning, scheduling, run state
arco-compactor  Tier-2 event consolidation
arco-proto      Cross-language protobuf contracts
arco-core       Shared primitives (tenant context, IDs, errors)

Task execution runs in external workers via a canonical dispatch envelope. The browser query path uses DuckDB-WASM against signed URLs - no always-on infrastructure required.

Proto compatibility

The pre-freeze hard cut is complete. The current arco.*.v1 packages are the durable public proto surface represented by the frozen post-cut baseline. New v1 changes must be additive and must preserve binary and ProtoJSON compatibility. Run cargo xtask proto-breaking-check before merging proto changes.

Contributing

License

Apache License 2.0 - see LICENSE and NOTICE.

About

Serverless lakehouse infrastructure - file-native catalog and execution-first orchestration

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors