Download satellite imagery from Google Earth Engine directly to your local disk. No Google Cloud Storage. No Google Drive. No export tasks. Resumable, crash-safe, fully YAML-driven.
geedl is a high-throughput, local-first command-line tool for downloading
Sentinel-1, Sentinel-2, Landsat 7, Landsat 8, and Landsat 9 imagery from
Google Earth Engine. Point it at a shapefile and a date range — it tiles your
ROI, composites scenes, computes spectral indices (NDVI, EVI, NDWI, NBR, RVI…),
and writes Cloud-Optimized GeoTIFFs straight to your machine.
If you've ever tried to download Earth Engine imagery for a large area, you know the pain: export tasks that take hours, files trapped in Google Drive, Cloud Storage buckets you have to pay for, and no way to resume when something fails. geedl skips all of that.
| Traditional EE export | geedl | |
|---|---|---|
| Destination | Google Drive / GCS bucket | Local disk |
| Throughput | Single export task | Parallel async tiles (default 16) |
| Resume after crash | Re-export from scratch | SQLite checkpoint — pick up exactly where you stopped |
| Tile size tuning | Manual | Auto-calculated from pixel budget |
| Output format | GeoTIFF | Cloud-Optimized GeoTIFF + STAC sidecar + GeoParquet catalog |
| Configuration | Python script per job | One YAML file |
- Direct download from Earth Engine via
ee.data.computePixels()— no GCS, no Drive. - Shapefile ROI support — auto-projects to UTM, simplifies for upload, tiles intelligently.
- Smart tiling — classifies tiles as
inside,partial, oroutside; skips empty space; orders by Hilbert curve for cache-warm EE requests. - Time-windowed compositing — fixed-day, calendar-month, calendar-year, full-range, or single-scene modes. Scene mode suggests nearest available dates when the requested day has no imagery.
- Per-window tile merging — tiles stream to a temp staging dir, then merge into one COG per ROI/window before the job finalises.
- Live progress bars — overall + per-window tqdm bars track every tile through download, validation, and write.
- Spectral indices — built-in NDVI, EVI, NDWI, NDMI, NBR, NDSI, SAVI, BSI, RVI, VV/VH ratio. Add your own with one decorated function.
- Crash-safe & resumable — every tile is checkpointed; atomic writes (
.tmp→os.rename) guarantee no corrupt files on disk. - Cloud masking built in — Sentinel-2 SCL, Landsat C2
QA_PIXEL. Cloud + shadow + snow toggles per job. - Landsat 7 SLC-off handled by multi-temporal compositing — no focal blur, no broken indices.
- COG output — natively readable by QGIS, GDAL, stackstac, and STAC browsers.
- AI-agent friendly — the YAML config is the single source of truth. Swap datasets, indices, output shapes, or pipeline behavior without touching Python.
geedl uses a conda environment for Python + system-level geospatial
libraries (GDAL, PROJ, GEOS) and uv for fast Python dependency resolution
inside that environment.
# 1. Create and activate the conda environment
conda create -n geedl python=3.12 -y
conda activate geedl
# 2. Install uv inside the env
conda install -c conda-forge uv -y
# 3. Install geedl with uv (editable)
uv pip install -e .Or, with development tooling (pytest, ruff, mypy):
uv pip install -e ".[dev]"You'll also need to authenticate with Earth Engine once:
earthengine authenticateEvery subsequent shell session needs
conda activate geedlbefore running thegeedlCLI.
Create job.yaml:
job_name: tuscany_ndvi_q1_2023
roi:
path: data/tuscany.shp
dataset:
name: sentinel-2
bands:
select: [B2, B3, B4, B8]
indices:
- {name: NDVI}
- {name: EVI}
date:
start: "2023-01-01"
end: "2023-03-31"
composite:
strategy: median
window:
type: fixed_days
size: 30
step: 30
label_format: "%Y-%m-%d"
output:
dir: ./output
format: COG
dtype: float32
asset:
project: my-ee-project
base_path: users/me/geedl_assetsgeedl validate -c job.yaml # check config (no EE calls)
geedl plan -c job.yaml # preview windows + tile count
geedl run -c job.yaml # download to ./outputJust re-run the same command. Completed tiles are skipped automatically.
geedl run -c job.yaml # resume
geedl run -c job.yaml --retry-failed # also retry tiles that failed
geedl status -c job.yaml # check progressgeedl was designed so the entire job — sensor, indices, time windows, output
shape, concurrency, auth — lives in one declarative YAML file. That makes
it the ideal target for an LLM coding agent: there's no Python to write, no
SDK to learn, no notebook to debug. You describe what you want, the agent
emits a config, you run it.
Drop this README and CLAUDE.md into your agent's context (Claude Code,
Cursor, Codex, Aider, ChatGPT — any of them), then ask in plain English.
You:
I need a monthly NDVI and NDWI time series over my farm in Tuscany for all of 2023. Shapefile is at
data/tuscany.shp. I want Sentinel-2 with clouds and shadows masked, output as COGs, and just the indices — drop the source bands. Run 24 tiles in parallel.
Agent (one shot, produces job.yaml):
job_name: tuscany_ndvi_ndwi_2023
roi:
path: data/tuscany.shp
dataset:
name: sentinel-2
bands:
select: [] # drop source bands, keep only indices
indices:
- {name: NDVI}
- {name: NDWI}
cloud_mask:
enabled: true
mask_shadow: true
mask_snow: false
date:
start: "2023-01-01"
end: "2023-12-31"
composite:
strategy: median
window:
type: calendar_month
label_format: "%Y-%m"
output:
dir: ./output
format: COG
dtype: float32
compression: DEFLATE
structure:
separate_indices: true # one GeoTIFF per index
pipeline:
concurrency: 24
asset:
project: my-ee-project
base_path: projects/my-ee-project/assets
auth:
method: browserYou:
geedl plan -c job.yaml && geedl run -c job.yamlThat's the whole loop. Need to change sensor? "Switch to Landsat 8 with the
same indices." Need a different window? "Make it 16-day composites anchored
on the start of the window." Need urban detection from SAR instead?
"Give me the Sentinel-1 SAR_URBAN false-color over the same ROI for the
first week of May." The agent edits the YAML — you re-run geedl.
See examples/ for ten concrete configs (NDVI, NDWI, RGB,
S1 RTC, oil-spill OSFC, S1 SAR urban false-color, scene-mode, …) that
double as few-shot prompts for any LLM.
Why this works:
CLAUDE.mddocuments every config field, every module boundary, and every non-negotiable constraint (atomic writes, plugin-only indices, S1-must-be-mosaic, …). An agent reading it has the full schema and the full set of rules — so it doesn't hallucinate fields or pick physically wrong composite strategies.
Two methods, selected in YAML:
# Browser flow (default) — uses your `earthengine authenticate` credentials.
auth:
method: browser
# Service account — for headless / CI use.
auth:
method: service_account
service_account_email: bot@my-proj.iam.gserviceaccount.com
key_file: /etc/secrets/ee-key.jsonSentinel-2 — monthly NDVI/EVI composites
dataset:
name: sentinel-2
bands: {select: [B2, B3, B4, B8, B11]}
indices: [{name: NDVI}, {name: EVI}, {name: NDMI}]
cloud_mask: {enabled: true, mask_shadow: true, mask_snow: false}
composite:
strategy: median
window: {type: calendar_month}Sentinel-1 — VV/VH backscatter mosaics
dataset:
name: sentinel-1
bands: {select: [VV, VH]}
indices: [{name: RVI}]
composite:
strategy: median # ignored — S1 always forces mosaic (see CLAUDE.md §7)
window: {type: fixed_days, size: 12, step: 12}Landsat 8/9 — quarterly composites
dataset:
name: landsat-8
bands: {select: [SR_B2, SR_B3, SR_B4, SR_B5, SR_B6, SR_B7]}
indices: [{name: NDVI}, {name: NBR}, {name: SAVI}]
composite:
strategy: median
window: {type: fixed_days, size: 90, step: 90, anchor: center}Single-date scene mode — grab the nearest available Sentinel-1 acquisition
dataset:
name: sentinel-1
bands: {select: [VV, VH]}
date:
start: "2024-06-15"
end: "2024-06-15"
composite:
strategy: none
window: {type: scene} # one output per intersecting scene; suggests nearby dates if emptyIndices-only output — drop the source bands, keep only the computed index
dataset:
name: sentinel-2
bands:
select: [] # [] = no source bands; null = all registry bands; list = those bands
indices:
- {name: NDVI}
output:
structure:
separate_indices: true # each index gets its own GeoTIFFbands.select is a tri-state:
null(omitted) — keep all bands defined inregistry.yamlfor the dataset.[]— keep no source bands. The job is rejected at validation time unless at least one index is requested.[B4, B8, ...]— keep exactly those bands.
In all three cases, indices are computed from the native source bands regardless of select (the expressions reference NIR/RED/etc. directly), so select: [] still produces valid NDVI/EVI/etc. output.
Landsat 7 — SLC-off recovery via long compositing window
dataset:
name: landsat-7
bands: {select: [SR_B3, SR_B4, SR_B5]}
indices: [{name: NDVI}]
slc_off: {strategy: multi_temporal, min_scenes_warning: 5}
composite:
strategy: median
window: {type: calendar_year} # wide enough to fill SLC gapspipeline:
concurrency: 16 # parallel async tiles
max_retries: 6
retry_base_delay: 1.0
timeout_per_tile: 120
tiling:
max_tile_bytes: null # null = auto, derived from EE's 50 MB request budget
overlap_px: 2 # request buffer to avoid seam artifacts
skip_coverage_threshold: 0.05 # tiles <5% inside ROI are skippedRun user code at three lifecycle points (format: module.path:function_name):
hooks:
pre_download: my_pkg.hooks:before_tile
post_tile: my_pkg.hooks:after_tile
post_job: my_pkg.hooks:on_finishpytest # full unit + integration suite (~2s, no EE calls)
pytest --live # also runs the opt-in EE smoke tests
# requires: GEEDL_TEST_EE_PROJECT=ee-tat3 \
# GEEDL_TEST_EE_KEY=/credentials.json \
# pytest tests/test_live_smoke.py --live
GEEDL_TEST_EE_PROJECT=ee-tat3 GEEDL_TEST_EE_KEY=ee-tat3-835f3bd207eb.json pytest --live
pytest tests/test_indices_matrix.py -v # one module| Slug | Collection | Native resolution |
|---|---|---|
sentinel-2 |
COPERNICUS/S2_SR_HARMONIZED |
10 m |
sentinel-1 |
COPERNICUS/S1_GRD (IW, DESC) |
10 m |
landsat-7 |
LANDSAT/LE07/C02/T1_L2 |
30 m |
landsat-8 |
LANDSAT/LC08/C02/T1_L2 |
30 m |
landsat-9 |
LANDSAT/LC09/C02/T1_L2 |
30 m |
Add new datasets by editing geedl/datasets/registry.yaml — no Python changes required.
geedl datasets # list available datasets
geedl indices --dataset sentinel-2 # list compatible indicesOut of the box: NDVI, NDWI, NDMI, NBR, NDSI, EVI, SAVI, BSI (optical) and RVI, VV/VH ratio (SAR).
Adding a new index takes one function:
# geedl/indices/optical.py
from . import index
@index("CIRE", datasets=["sentinel-2"])
def cire(img, ds):
return img.expression("NIR/RED_EDGE - 1", {
"NIR": img.select("B8"),
"RED_EDGE": img.select("B5"),
}).rename("CIRE")Reference it from any YAML config — no other code changes needed.
output/
sentinel-2/
2023-01-01/
tile_A00_2023-01-01.tif # Cloud-Optimized GeoTIFF
tile_A00_2023-01-01.json # STAC Item sidecar
tile_B01_2023-01-01.tif
...
2023-02-01/
...
catalog.parquet # GeoParquet spatial index of all tiles
job.yaml # frozen copy of the config used
checkpoint.db # SQLite resume state
Read the catalog back with any GeoParquet-aware tool:
import geopandas as gpd
gdf = gpd.read_parquet("output/catalog.parquet")
gdf[gdf.datetime.str.startswith("2023-01")].plot()- ROI prep — shapefile is loaded, auto-projected to UTM, simplified, and uploaded once as an EE asset (deterministic hash-based ID, so the same ROI is reused across runs).
- Tiling — the bounding box is tiled into fixed-size squares whose dimensions are derived from EE's per-request pixel budget. Tiles outside the ROI are skipped; tiles on the edge are tagged
partialand get both a server-sideimg.clip()and a local rasterio mask. - Windowing — the date range is split into compositing windows (fixed days, calendar months, etc.).
- Async download — each (tile × window) is fetched concurrently via
ee.data.computePixels()in NPY format. Failures are retried with exponential backoff + full jitter. - Validation — each array is checked for shape, all-nodata, and plausible value range before being written.
- Atomic write — data is written to
{path}.tmp.tif, internally tiled at 256×256, overviews are built, thenos.rename()swaps it into place. - Checkpoint — only after the rename succeeds is the tile marked
donein the SQLite checkpoint. Crash recovery resetsin_flighttiles topendingand deletes any stragglers on the next launch. - Merge & catalog — once every tile in a window is
done, partial tiles are merged into one COG per ROI/window. The STAC sidecars andcatalog.parquetare written from the merged outputs at the end of the job.
See ARCH.md for the full design rationale and decision log.
geedl run -c job.yaml [--fresh] [--retry-failed] # run or resume
geedl validate -c job.yaml # check config
geedl plan -c job.yaml # dry-run preview
geedl status -c job.yaml # tile counts
geedl datasets # list datasets
geedl indices --dataset sentinel-2 # list indices
geedl cleanup -c job.yaml # delete EE assetgeedl v0.1 is pre-1.0 software. Core pipeline works end-to-end on the
listed datasets, including scene-mode for single-date jobs with nearest-date
fallback suggestions. Known gaps: single-process only, no GUI.
See ARCH.md §17 for the full caveat list.
Issues and PRs welcome. The codebase has a strict layered dependency graph
(utils → datasets → indices → io/roi → pipeline → cli) and a plugin-only
index engine — see CLAUDE.md for module contracts and
testing conventions before opening a PR.
The project leans hard on minimalism and anti-overengineering: prefer
editing existing modules over adding new ones, no speculative abstractions, no
backwards-compat shims, no defensive scaffolding inside trust boundaries, and
comments only where the why is non-obvious. See the "Engineering ethos"
section of CLAUDE.md for the full rules.
MIT