AETHER is a lightweight multimodal alignment framework that aligns AlphaEarth (AE) embeddings with POI-derived urban semantics through contrastive learning. The aligned representations support both urban downstream tasks and natural language–conditioned spatial retrieval.
Paper:
Note: The current preprint provides an early version of the work.
The complete paper will be released in a forthcoming version.
AlphaEarth embeddings are introduced by DeepMind in
https://deepmind.google/blog/alphaearth-foundations-helps-map-our-planet-in-unprecedented-detail/
Clone the repository:
git clone <repo-url>
cd AETHER
Create a Python environment:
python3 -m venv .venv
source .venv/bin/activate
Install dependencies:
pip install -r requirements.txt
Download AlphaEarth embeddings and place them in:
data/AE/AlphaEarth_embedding.tif
The raster should contain 64-dimensional AlphaEarth embeddings.
Prepare a POI dataset containing:
| Field | Description |
|---|---|
| id | POI id |
| description | textual description used for embedding |
| geometry | point geometry |
Example file:
data/poi/poi_example.geojson
Two text encoding options are supported.
Set API key:
export OPENAI_API_KEY=your_key_here
Model example:
text-embedding-3-large
text-embedding-3-small
Example model:
sentence-transformers/all-MiniLM-L6-v2
Training is controlled by a configuration file.
Run:
python train.py --cfg config.yaml
The training pipeline performs the following steps:
- Load POI data with textual descriptions and spatial coordinates
- Extract AlphaEarth embeddings around POI locations using spatial window pooling
- Encode POI text using a pretrained language model
- Project AE and POI embeddings into a shared latent space
- Train a multimodal contrastive alignment model
For each POI location, two spatial views are constructed:
- Base view: AE embeddings pooled from a smaller spatial window
- Augmented view: AE embeddings pooled from a larger spatial window
These two views capture spatial context at different scales.
The model jointly optimizes two contrastive objectives:
L = λ L_AE-AE + (1-λ) L_AE-POI
where:
- L_AE-AE enforces multi-scale consistency between AE embeddings
- L_AE-POI aligns AE embeddings with POI semantic representations
To transform AlphaEarth embeddings into AETHER embeddings:
python infer_aether_raster.py \
--ae_tif ./data/AE/Alphaearth_embedding_2024.tif \
--ckpt ./ckpts/epoch_0100.pth \
--out ./outputs/AETHER_embedding.tifIf you use AETHER in your research, please cite:
@article{liu2025beyond,
title = {Beyond AlphaEarth: Toward Human-Centered Geospatial Foundation Models via POI-Guided Contrastive Learning},
author = {Liu, Junyuan and Qin, Quan and Dong, Guangsheng and Wang, Xinglei and Feng, Jiazhuang and Zeng, Zichao and Cheng, Tao},
journal = {arXiv preprint arXiv:2510.09894},
year = {2025},
url = {https://arxiv.org/abs/2510.09894}
}
