This is the repository for CLEAR-IT: Contrastive Learning to Capture the Immune Composition of Tumor Microenvironments.
For pre-trained models, embeddings, and model predictions, see our supplementary data repository DOI: 10.4121/ebc792ad-4767-4aef-b8ff-ae653e901e3f.
If you use CLEAR-IT, please cite:
- Archived software snapshot for manuscript reproducibility (4TU DOI): 10.4121/365ab556-b03f-49d9-b8b8-58f48aae85ec
- TNBC1-MxIF8 dataset images: 10.4121/126d8103-6de5-4493-a48e-5d529fef471e
- CLEAR-IT supplementary data (models/embeddings/outputs): 10.4121/ebc792ad-4767-4aef-b8ff-ae653e901e3f
CLEAR-IT is maintained and tested in the provided Docker environment (clearit.Dockerfile).
Local (non-Docker) installation is available for advanced users, but Docker is the primary tested path.
A single, ready-to-build Dockerfile is provided as clearit.Dockerfile. The image installs CLEAR-IT and, on first run, auto-creates a config.yaml if you mount your CLEAR-IT-Data folder at /data.
Build the image:
docker build -f clearit.Dockerfile -t clearit:latest .Run (starts JupyterLab per the image entrypoint). Replace the path with your local CLEAR-IT-Data folder. Add --gpus all if you have NVIDIA GPUs set up:
docker run --rm -p 8888:8888 \
-v /abs/path/to/CLEAR-IT-Data:/data \
clearit:latestWhat happens on first run?
- The container detects the
/datamount and writes/workspace/config.yamlwith all paths pointing to/data/...andexperiments_dirpointing to the experiments bundled in the container install. - You can override the file later if you want custom locations.
Need a shell instead of Jupyter? Use this command:
docker run --rm -it --entrypoint bash \ -v /abs/path/to/CLEAR-IT-Data:/data \ clearit:latest
-
Clone the repository and (optionally) create a fresh Python environment.
git clone https://github.com/qnano/CLEAR-IT.git cd CLEAR-IT # (optional) create & activate a virtual environment python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
-
Install dependencies and the package.
pip install -r requirements.txt pip install -e .
The CLEAR-IT library exposes three driver scripts to (1) pre-train encoders, (2) train classification heads, and (3) perform linear evaluation. Each script is pointed to a YAML recipe describing one or more experiments. Recipe files live under the experiments/ folder in this repository.
Using Docker? You can skip this step on first run — the container writes /workspace/config.yaml automatically when /data is mounted (see Installation → Docker). For local installs, copy and edit the template:
cp config_template.yaml config.yaml
# then open config.yaml and update the paths under `paths:`The scripts and notebooks load config.yaml from the repository root. In most cases, setting paths.data_root is enough, because the remaining paths default to subdirectories under data_root.
If your experiments live in a different location than data_root/experiments (for example, using the GitHub repository's experiments/ directory), set paths.experiments_dir explicitly.
Each recipe can contain multiple encoders/heads to be trained or parameters to use for linear evaluation. Below are three examples.
Pre-training all encoders for investigating changes of the pre-training batch size and NT-Xent temperature on the TNBC1-MxIF8 dataset:
python -m clearit.scripts.run_pretrain --recipe ./experiments/01_hyperopt/tnbc1-mxif8/round01/01_pretrain/01_batch-tau.yamlTraining linear classification heads on top of those pre-trained encoders:
python -m clearit.scripts.run_train_heads --recipe ./experiments/01_hyperopt/tnbc1-mxif8/round01/02_classifier/01_batch-tau.yamlUsing the classification heads to perform linear evaluation of those pre-trained encoders:
python -m clearit.scripts.run_inference_pipeline --recipe ./experiments/01_hyperopt/tnbc1-mxif8/round01/03_linear-eval/01_batch-tau.yamlWhere do results go?
- Trained encoders and heads are saved under
models_dir. - Predictions are written under
outputs_dir. - All of these locations are defined in your
config.yaml(see below).
If you want to train the models yourself and are starting from raw sources, use the scripts in scripts/ to convert external datasets into the unified format expected by CLEAR-IT. These scripts read from raw_datasets and write to datasets as configured in config.yaml.
We recommend downloading the prepared data from our supplementary data repository DOI 10.4121/ebc792ad-4767-4aef-b8ff-ae653e901e3f, which contains the folder structure and instructions on how to obtain the raw datasets for conversion.
This repository's structure is as follows:
.
├── clearit # CLEAR-IT Python library
├── clearit.Dockerfile # Dockerfile for running CLEAR-IT in a Docker container
├── config_template.yaml # Template config file. Modify and rename this to config.yaml
├── experiments # YAML recipe files for training all models and performing linear evaluation
├── notebooks # Jupyter Notebooks for plotting
├── requirements.txt # requirements.txt for custom environments
├── scripts # Scripts for converting external datasets used in the study to a unified format
└── setup.py # setup.py for local installs of clearit
We recommend placing the contents of supplementary data repository DOI 10.4121/ebc792ad-4767-4aef-b8ff-ae653e901e3f in this directory (or somewhere else on fast storage), extending the structure as follows:
├── datasets # Location of the converted datasets, ready to be used by CLEAR-IT
├── embeddings # Pre-computed embeddings for benchmarking purposes
├── models # Pre-trained CLEAR-IT encoders and linear classifiers
├── outputs # Predictions made via linear evaluation or benchmarking, survival classifiers
├── raw_datasets # Location of the unconverted datasets - the conversion scripts in the scripts directory will move these to the datasets directory
The config_template.yaml file contains a template for a config.yaml file, which scripts and notebooks will look for:
# config_template.yaml
# Create a copy of this file and name it `config.yaml` to point to custom paths
paths:
# Absolute or relative path to the unpacked CLEAR-IT-Data directory
data_root: /path/to/data/repository/CLEAR-IT
# datasets_dir: /path/to/data/repository/CLEAR-IT/datasets
# raw_datasets_dir: /path/to/data/repository/CLEAR-IT/raw_datasets
# models_dir: /path/to/data/repository/CLEAR-IT/models
# outputs_dir: /path/to/data/repository/CLEAR-IT/outputs
# experiments_dir: /path/to/repo/CLEAR-IT/experiments # Set explicitly when experiments are not under data_rootBy modifying the config.yaml, you are free to choose where you place individual directories (if space is a concern). If you want to train models, we recommend putting the datasets directory on fast storage (for example an SSD).
The MAPS benchmark stage in clearit/maps_benchmark contains a small
MAPS-derived runtime adapted for the public CLEAR-IT reproduction pipeline.
In case you use this, please also cite the original MAPS paper:
- Shaban, M., Bai, Y., Qiu, H. et al. MAPS: pathologist-level cell type annotation from tissue images through machine learning. Nature Communications 15, 28 (2024). https://doi.org/10.1038/s41467-023-44188-w
Upstream repository:
License and attribution details for the vendored MAPS-derived files are listed in clearit/maps_benchmark/THIRD_PARTY_NOTICES.md.
FileNotFoundError: config.yaml— ensure you copiedconfig_template.yamltoconfig.yamlat the repository root.- Docker can’t see your data — double-check your
-v /host/path:/container/pathvolume mounts and thatconfig.yamluses the container paths when running inside Docker.