Skip to content

itslavrov/lakekit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Full-Stack Data Lakehouse

Lint GitHub Release License: MIT Trino 476 Airflow 3.0 Telegram

One-command data stack: local, cloud, or Kubernetes.

Iceberg tables on MinIO, Nessie catalog with git-like versioning, Trino SQL engine, Airflow orchestration, dbt transformations (landing → staging → curated), and automated deployment to VK Cloud via Terraform + Packer.

Quick start

Local deployment

git clone https://github.com/itslavrov/lakekit.git && cd lakekit
cd stack
cp .env.example .env            # edit credentials or use auto-generation:
# ../scripts/03-generate-env.sh  # generates random secure passwords

docker compose -f docker-compose-airflow.yaml build
./manage.sh start

VK Cloud deployment

# 1. Build VM image
cd image
source your-openrc.sh
packer init . && packer build .

# 2. Deploy
cd ../terraform
cp terraform.tfvars.example terraform.tfvars   # fill in image_id, network, etc.
terraform init && terraform apply

See Terraform deployment for details.

Kubernetes deployment

helm install lakehouse k8s/lakekit/ \
  --namespace lakehouse --create-namespace

See Kubernetes deployment for details.

Endpoints

Service URL Default User
MinIO Console http://localhost:9001 minioadmin
Airflow http://localhost:8081 airflow
Trino http://localhost:8080 trino
Nessie http://localhost:19120

Repository structure

├── stack/              Docker Compose stack, Trino config, Airflow DAGs, dbt project
├── k8s/                Helm chart for Kubernetes deployment
├── scripts/            Automation for install, env generation, lifecycle management
├── image/              Packer + Ansible for building VM images (VK Cloud)
└── terraform/          Infrastructure as Code for cloud deployment

Stack

Component Version Role
MinIO RELEASE.2024-11-07 S3-compatible object storage
Nessie 0.76.6 Git-like Iceberg catalog
Trino 476 Distributed SQL query engine
Airflow 3.0.6 Workflow orchestration (CeleryExecutor)
dbt 1.10.10 SQL transformations (trino adapter)
PostgreSQL 13 Airflow metadata database
Redis 7.2 Celery message broker

Documentation

License

MIT