Minilake is a lightweight, Python-based data lake solution in early development. This project aims to provide simple components for data storage, ingestion, and querying with a focus on Delta Lake integration and S3 compatibility.
Note: This project is in early development. The architecture, APIs, and features are subject to change as the project evolves.
The project currently provides basic building blocks for:
- Storage with S3/MinIO and Delta Lake support
- Data ingestion for CSV and Parquet files
- Data querying via DuckDB 🦆
- Python 3.12 or higher
- Docker (optional, for containerized deployment with MinIO)
- Clone the repository:
git clone https://github.com/anquev/minilake.git
cd minilake- Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate- Install dependencies:
pip install -e ".[dev]"Create a .env file in the project root with the following variables:
# MinIO Configuration
MINIO_ENDPOINT=localhost:9000
MINIO_ROOT_USER=your_access_key
MINIO_ROOT_PASSWORD=your_secret_key
MINIO_DEFAULT_BUCKETS=your_bucketThe following features are planned for future development:
- Unified client interface (probably with duckdb ui)
- Additional ingestion formats (Excel, JSON)
- Enhanced FastAPI endpoints for data retrieval
- Enhanced query capabilities
- Iceberg table support ...