CLEAR-IT

This is the repository for CLEAR-IT: Contrastive Learning to Capture the Immune Composition of Tumor Microenvironments.

For pre-trained models, embeddings, and model predictions, see our supplementary data repository DOI: 10.4121/ebc792ad-4767-4aef-b8ff-ae653e901e3f.

Citation

If you use CLEAR-IT, please cite:

Archived software snapshot for manuscript reproducibility (4TU DOI): 10.4121/365ab556-b03f-49d9-b8b8-58f48aae85ec
TNBC1-MxIF8 dataset images: 10.4121/126d8103-6de5-4493-a48e-5d529fef471e
CLEAR-IT supplementary data (models/embeddings/outputs): 10.4121/ebc792ad-4767-4aef-b8ff-ae653e901e3f

Runtime environment

CLEAR-IT is maintained and tested in the provided Docker environment (clearit.Dockerfile). Local (non-Docker) installation is available for advanced users, but Docker is the primary tested path.

Installation

Option A — Docker (primary tested path)

A single, ready-to-build Dockerfile is provided as clearit.Dockerfile. The image installs CLEAR-IT and, on first run, auto-creates a config.yaml if you mount your CLEAR-IT-Data folder at /data.

Build the image:

docker build -f clearit.Dockerfile -t clearit:latest .

Run (starts JupyterLab per the image entrypoint). Replace the path with your local CLEAR-IT-Data folder. Add --gpus all if you have NVIDIA GPUs set up:

docker run --rm -p 8888:8888 \
  -v /abs/path/to/CLEAR-IT-Data:/data \
  clearit:latest

What happens on first run?

The container detects the /data mount and writes /workspace/config.yaml with all paths pointing to /data/... and experiments_dir pointing to the experiments bundled in the container install.
You can override the file later if you want custom locations.

Need a shell instead of Jupyter? Use this command:
docker run --rm -it --entrypoint bash \
  -v /abs/path/to/CLEAR-IT-Data:/data \
  clearit:latest

Option B — Local install (advanced users)

Clone the repository and (optionally) create a fresh Python environment.

git clone https://github.com/qnano/CLEAR-IT.git
cd CLEAR-IT
# (optional) create & activate a virtual environment
python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate

Install dependencies and the package.

pip install -r requirements.txt
pip install -e .

Usage

The CLEAR-IT library exposes three driver scripts to (1) pre-train encoders, (2) train classification heads, and (3) perform linear evaluation. Each script is pointed to a YAML recipe describing one or more experiments. Recipe files live under the experiments/ folder in this repository.

1) Configure paths

Using Docker? You can skip this step on first run — the container writes /workspace/config.yaml automatically when /data is mounted (see Installation → Docker). For local installs, copy and edit the template:

cp config_template.yaml config.yaml
# then open config.yaml and update the paths under `paths:`

The scripts and notebooks load config.yaml from the repository root. In most cases, setting paths.data_root is enough, because the remaining paths default to subdirectories under data_root. If your experiments live in a different location than data_root/experiments (for example, using the GitHub repository's experiments/ directory), set paths.experiments_dir explicitly.

2) Pick a recipe and run

Each recipe can contain multiple encoders/heads to be trained or parameters to use for linear evaluation. Below are three examples.

Pre-training all encoders for investigating changes of the pre-training batch size and NT-Xent temperature on the TNBC1-MxIF8 dataset:

python -m clearit.scripts.run_pretrain --recipe ./experiments/01_hyperopt/tnbc1-mxif8/round01/01_pretrain/01_batch-tau.yaml

Training linear classification heads on top of those pre-trained encoders:

python -m clearit.scripts.run_train_heads --recipe ./experiments/01_hyperopt/tnbc1-mxif8/round01/02_classifier/01_batch-tau.yaml

Using the classification heads to perform linear evaluation of those pre-trained encoders:

python -m clearit.scripts.run_inference_pipeline --recipe ./experiments/01_hyperopt/tnbc1-mxif8/round01/03_linear-eval/01_batch-tau.yaml

Where do results go?

Trained encoders and heads are saved under models_dir.
Predictions are written under outputs_dir.
All of these locations are defined in your config.yaml (see below).

3) (Optional) Convert raw datasets

If you want to train the models yourself and are starting from raw sources, use the scripts in scripts/ to convert external datasets into the unified format expected by CLEAR-IT. These scripts read from raw_datasets and write to datasets as configured in config.yaml.

We recommend downloading the prepared data from our supplementary data repository DOI 10.4121/ebc792ad-4767-4aef-b8ff-ae653e901e3f, which contains the folder structure and instructions on how to obtain the raw datasets for conversion.

Repository structure and `config.yaml`

This repository's structure is as follows:

.
├── clearit                # CLEAR-IT Python library
├── clearit.Dockerfile     # Dockerfile for running CLEAR-IT in a Docker container
├── config_template.yaml   # Template config file. Modify and rename this to config.yaml
├── experiments            # YAML recipe files for training all models and performing linear evaluation
├── notebooks              # Jupyter Notebooks for plotting
├── requirements.txt       # requirements.txt for custom environments
├── scripts                # Scripts for converting external datasets used in the study to a unified format
└── setup.py               # setup.py for local installs of clearit

We recommend placing the contents of supplementary data repository DOI 10.4121/ebc792ad-4767-4aef-b8ff-ae653e901e3f in this directory (or somewhere else on fast storage), extending the structure as follows:

├── datasets               # Location of the converted datasets, ready to be used by CLEAR-IT
├── embeddings             # Pre-computed embeddings for benchmarking purposes
├── models                 # Pre-trained CLEAR-IT encoders and linear classifiers
├── outputs                # Predictions made via linear evaluation or benchmarking, survival classifiers
├── raw_datasets           # Location of the unconverted datasets - the conversion scripts in the scripts directory will move these to the datasets directory

The config_template.yaml file contains a template for a config.yaml file, which scripts and notebooks will look for:

# config_template.yaml
# Create a copy of this file and name it `config.yaml` to point to custom paths
paths:
  # Absolute or relative path to the unpacked CLEAR-IT-Data directory
  data_root: /path/to/data/repository/CLEAR-IT
#  datasets_dir: /path/to/data/repository/CLEAR-IT/datasets
#  raw_datasets_dir: /path/to/data/repository/CLEAR-IT/raw_datasets
#  models_dir: /path/to/data/repository/CLEAR-IT/models
#  outputs_dir: /path/to/data/repository/CLEAR-IT/outputs
#  experiments_dir: /path/to/repo/CLEAR-IT/experiments             # Set explicitly when experiments are not under data_root

By modifying the config.yaml, you are free to choose where you place individual directories (if space is a concern). If you want to train models, we recommend putting the datasets directory on fast storage (for example an SSD).

MAPS benchmark attribution

The MAPS benchmark stage in clearit/maps_benchmark contains a small MAPS-derived runtime adapted for the public CLEAR-IT reproduction pipeline. In case you use this, please also cite the original MAPS paper:

Shaban, M., Bai, Y., Qiu, H. et al. MAPS: pathologist-level cell type annotation from tissue images through machine learning. Nature Communications 15, 28 (2024). https://doi.org/10.1038/s41467-023-44188-w

Upstream repository:

https://github.com/mahmoodlab/MAPS

License and attribution details for the vendored MAPS-derived files are listed in clearit/maps_benchmark/THIRD_PARTY_NOTICES.md.

Troubleshooting

FileNotFoundError: config.yaml — ensure you copied config_template.yaml to config.yaml at the repository root.
Docker can’t see your data — double-check your -v /host/path:/container/path volume mounts and that config.yaml uses the container paths when running inside Docker.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLEAR-IT

Citation

Runtime environment

Installation

Option A — Docker (primary tested path)

Option B — Local install (advanced users)

Usage

1) Configure paths

2) Pick a recipe and run

3) (Optional) Convert raw datasets

Repository structure and `config.yaml`

MAPS benchmark attribution

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
clearit		clearit
experiments		experiments
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
clearit.Dockerfile		clearit.Dockerfile
config_template.yaml		config_template.yaml
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

CLEAR-IT

Citation

Runtime environment

Installation

Option A — Docker (primary tested path)

Option B — Local install (advanced users)

Usage

1) Configure paths

2) Pick a recipe and run

3) (Optional) Convert raw datasets

Repository structure and config.yaml

MAPS benchmark attribution

Troubleshooting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Repository structure and `config.yaml`

Packages