Skip to content

qnano/CLEAR-IT

Repository files navigation

CLEAR-IT

This is the repository for CLEAR-IT: Contrastive Learning to Capture the Immune Composition of Tumor Microenvironments.

For pre-trained models, embeddings, and model predictions, see our supplementary data repository DOI: 10.4121/ebc792ad-4767-4aef-b8ff-ae653e901e3f.

Citation

If you use CLEAR-IT, please cite:

Runtime environment

CLEAR-IT is maintained and tested in the provided Docker environment (clearit.Dockerfile). Local (non-Docker) installation is available for advanced users, but Docker is the primary tested path.

Installation

Option A — Docker (primary tested path)

A single, ready-to-build Dockerfile is provided as clearit.Dockerfile. The image installs CLEAR-IT and, on first run, auto-creates a config.yaml if you mount your CLEAR-IT-Data folder at /data.

Build the image:

docker build -f clearit.Dockerfile -t clearit:latest .

Run (starts JupyterLab per the image entrypoint). Replace the path with your local CLEAR-IT-Data folder. Add --gpus all if you have NVIDIA GPUs set up:

docker run --rm -p 8888:8888 \
  -v /abs/path/to/CLEAR-IT-Data:/data \
  clearit:latest

What happens on first run?

  • The container detects the /data mount and writes /workspace/config.yaml with all paths pointing to /data/... and experiments_dir pointing to the experiments bundled in the container install.
  • You can override the file later if you want custom locations.

Need a shell instead of Jupyter? Use this command:

docker run --rm -it --entrypoint bash \
  -v /abs/path/to/CLEAR-IT-Data:/data \
  clearit:latest

Option B — Local install (advanced users)

  1. Clone the repository and (optionally) create a fresh Python environment.

    git clone https://github.com/qnano/CLEAR-IT.git
    cd CLEAR-IT
    # (optional) create & activate a virtual environment
    python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate
  2. Install dependencies and the package.

    pip install -r requirements.txt
    pip install -e .

Usage

The CLEAR-IT library exposes three driver scripts to (1) pre-train encoders, (2) train classification heads, and (3) perform linear evaluation. Each script is pointed to a YAML recipe describing one or more experiments. Recipe files live under the experiments/ folder in this repository.

1) Configure paths

Using Docker? You can skip this step on first run — the container writes /workspace/config.yaml automatically when /data is mounted (see Installation → Docker). For local installs, copy and edit the template:

cp config_template.yaml config.yaml
# then open config.yaml and update the paths under `paths:`

The scripts and notebooks load config.yaml from the repository root. In most cases, setting paths.data_root is enough, because the remaining paths default to subdirectories under data_root. If your experiments live in a different location than data_root/experiments (for example, using the GitHub repository's experiments/ directory), set paths.experiments_dir explicitly.

2) Pick a recipe and run

Each recipe can contain multiple encoders/heads to be trained or parameters to use for linear evaluation. Below are three examples.

Pre-training all encoders for investigating changes of the pre-training batch size and NT-Xent temperature on the TNBC1-MxIF8 dataset:

python -m clearit.scripts.run_pretrain --recipe ./experiments/01_hyperopt/tnbc1-mxif8/round01/01_pretrain/01_batch-tau.yaml

Training linear classification heads on top of those pre-trained encoders:

python -m clearit.scripts.run_train_heads --recipe ./experiments/01_hyperopt/tnbc1-mxif8/round01/02_classifier/01_batch-tau.yaml

Using the classification heads to perform linear evaluation of those pre-trained encoders:

python -m clearit.scripts.run_inference_pipeline --recipe ./experiments/01_hyperopt/tnbc1-mxif8/round01/03_linear-eval/01_batch-tau.yaml

Where do results go?

  • Trained encoders and heads are saved under models_dir.
  • Predictions are written under outputs_dir.
  • All of these locations are defined in your config.yaml (see below).

3) (Optional) Convert raw datasets

If you want to train the models yourself and are starting from raw sources, use the scripts in scripts/ to convert external datasets into the unified format expected by CLEAR-IT. These scripts read from raw_datasets and write to datasets as configured in config.yaml.

We recommend downloading the prepared data from our supplementary data repository DOI 10.4121/ebc792ad-4767-4aef-b8ff-ae653e901e3f, which contains the folder structure and instructions on how to obtain the raw datasets for conversion.

Repository structure and config.yaml

This repository's structure is as follows:

.
├── clearit                # CLEAR-IT Python library
├── clearit.Dockerfile     # Dockerfile for running CLEAR-IT in a Docker container
├── config_template.yaml   # Template config file. Modify and rename this to config.yaml
├── experiments            # YAML recipe files for training all models and performing linear evaluation
├── notebooks              # Jupyter Notebooks for plotting
├── requirements.txt       # requirements.txt for custom environments
├── scripts                # Scripts for converting external datasets used in the study to a unified format
└── setup.py               # setup.py for local installs of clearit

We recommend placing the contents of supplementary data repository DOI 10.4121/ebc792ad-4767-4aef-b8ff-ae653e901e3f in this directory (or somewhere else on fast storage), extending the structure as follows:

├── datasets               # Location of the converted datasets, ready to be used by CLEAR-IT
├── embeddings             # Pre-computed embeddings for benchmarking purposes
├── models                 # Pre-trained CLEAR-IT encoders and linear classifiers
├── outputs                # Predictions made via linear evaluation or benchmarking, survival classifiers
├── raw_datasets           # Location of the unconverted datasets - the conversion scripts in the scripts directory will move these to the datasets directory

The config_template.yaml file contains a template for a config.yaml file, which scripts and notebooks will look for:

# config_template.yaml
# Create a copy of this file and name it `config.yaml` to point to custom paths
paths:
  # Absolute or relative path to the unpacked CLEAR-IT-Data directory
  data_root: /path/to/data/repository/CLEAR-IT
#  datasets_dir: /path/to/data/repository/CLEAR-IT/datasets
#  raw_datasets_dir: /path/to/data/repository/CLEAR-IT/raw_datasets
#  models_dir: /path/to/data/repository/CLEAR-IT/models
#  outputs_dir: /path/to/data/repository/CLEAR-IT/outputs
#  experiments_dir: /path/to/repo/CLEAR-IT/experiments             # Set explicitly when experiments are not under data_root

By modifying the config.yaml, you are free to choose where you place individual directories (if space is a concern). If you want to train models, we recommend putting the datasets directory on fast storage (for example an SSD).

MAPS benchmark attribution

The MAPS benchmark stage in clearit/maps_benchmark contains a small MAPS-derived runtime adapted for the public CLEAR-IT reproduction pipeline. In case you use this, please also cite the original MAPS paper:

Upstream repository:

License and attribution details for the vendored MAPS-derived files are listed in clearit/maps_benchmark/THIRD_PARTY_NOTICES.md.

Troubleshooting

  • FileNotFoundError: config.yaml — ensure you copied config_template.yaml to config.yaml at the repository root.
  • Docker can’t see your data — double-check your -v /host/path:/container/path volume mounts and that config.yaml uses the container paths when running inside Docker.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors