Drug Repurposing Through Disease Similarity Analysis Using BiomarkerKB

Computational pipeline and interactive web dashboard for identifying drug repurposing candidates across multiple cancers, driven by biomarker evidence from BiomarkerKB and cross-referenced against GTEx, Pharos/IDG, GDC/TCGA, LINCS L1000, and STRING.

Presentation: Google Slides

Full project description on Google Docs

Overview

The pipeline queries six public databases to build a ranked list of drug repurposing candidates for a given cancer:

Step	Source	What it provides
1	BiomarkerKB	Disease-associated biomarker genes
2	GTEx	Tissue expression (TPM) per gene
3	Pharos / IDG	Target development level (Tclin → Tdark)
4	GDC / TCGA	Somatic mutation frequency + differential expression vs. normal
5	STRING	Protein–protein interaction network expansion
6	LINCS L1000	Drug perturbagen signatures matching query genes

Candidates are scored across five weighted components and ranked. Weights can be adjusted interactively in the web dashboard.

Diseases supported (configurable): Hepatocellular Carcinoma, Pancreatic Cancer, Colorectal Cancer, Lung Adenocarcinoma and Breast Cancer.

Setup

Python environment

Requires Python 3.9+.

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

LINCS data files (required for LINCS step)

The LINCS L1000 knowledge graph files are too large for git. Download them from:

https://dd-kg-ui.cfde.cloud/downloads

Place the following files in the data/ directory:

LINCS.Compound.nodes.csv
LINCS.Gene.nodes.csv
LINCS.edges.csv

The LINCS step can be skipped with --no-lincs if these files are unavailable.

Running the pipeline

python pipeline.py \
  --output-dir output \
  --search-terms "hepatocellular carcinoma" "HCC" \
  --gtex-tissue Liver \
  --gdc-project TCGA-LIHC

Key flags:

Flag	Default	Description
`--output-dir`	`output`	Directory for all result CSVs and JSON
`--search-terms`	HCC terms	BiomarkerKB condition search terms
`--gtex-tissue`	`Liver`	GTEx tissue (e.g. `Pancreas`, `Lung`, `Colon_Transverse`)
`--gdc-project`	`TCGA-LIHC`	TCGA project (e.g. `TCGA-PAAD`, `TCGA-COAD`)
`--no-lincs`	—	Skip LINCS L1000 step
`--no-pharos`	—	Skip Pharos/IDG step
`--no-gdc`	—	Skip GDC step
`--no-string`	—	Skip STRING PPI expansion
`--cache-dir`	`cache`	Local cache for API responses

Output files

Each run writes to --output-dir:

File	Contents
`biomarkers.csv`	BiomarkerKB hits
`gtex_liver_expression.csv`	Per-gene tissue TPM
`pharos_targets.csv`	IDG target annotations
`gdc_stats.csv`	Mutation freq + DE log2FC per gene
`string_interactions.csv`	PPI edges from STRING
`lincs_perturbagen_hits.csv`	Ranked perturbagens with annotations
`final_scored_candidates_v2.csv`	Final scored + ranked candidates
`summary.json`	Summary counts for each step

Scoring

After running the pipeline, score and rank candidates with:

python score.py --output-dir output --top 20

This reads lincs_perturbagen_hits.csv and writes final_scored_candidates_v2.csv with all component scores and a weighted total. Component score caps:

Component	Cap	Signal
BiomarkerKB	5	Number of biomarker genes hit
GTEx expression	15	Tissue TPM (log-normalized)
IDG/Pharos	8	Target development level
LINCS	10	Perturbagen signature strength
GDC/TCGA	8	Mutation freq + differential expression

Web dashboard

An interactive Django + React dashboard lets you explore results across all five cancers and adjust scoring weights in real time.

Start the backend

cd web
pip install -r requirements-web.txt
python manage.py migrate
python manage.py runserver

Start the frontend

cd web/frontend
npm install
npm run dev

Open http://localhost:5173 (or the Vite port shown in the terminal).

Features

Disease selector — switch between HCC, Pancreatic, Colorectal, Lung, and Breast cancer results
Summary cards — step-level counts (biomarkers, GTEx genes, targets, candidates)
Scoring panel — ranked drug candidates with per-component score bars; five weight sliders (0–3×) re-rank in real time without re-running the pipeline
BiomarkerKB, GTEx, Pharos, GDC, LINCS, STRING sections — paginated tables for each data source

The dashboard reads pre-computed CSV outputs. Run the pipeline at least once per disease before using the dashboard.

Repository layout

pipeline.py          # entry point: python pipeline.py [options]
score.py             # standalone scorer: python score.py [--output-dir ...]
src/
  pipeline.py        # pipeline orchestration
  scoring.py         # multi-component scoring module
  clients/           # API clients (BiomarkerKB, GTEx, Pharos, GDC, LINCS, STRING)
  models.py          # dataclasses for intermediate results
web/
  api/               # Django REST views + URL routing
  config/            # Django settings
  frontend/src/      # React components
data/                # LINCS files (not in git — see setup above)
output*/             # Pipeline results per disease (not in git)

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.claude/memory		.claude/memory
notebooks		notebooks
src		src
web		web
.clocignore		.clocignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
make_scorecard.py		make_scorecard.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt
rerank.py		rerank.py
score.py		score.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Drug Repurposing Through Disease Similarity Analysis Using BiomarkerKB

Overview

Setup

Python environment

LINCS data files (required for LINCS step)

Running the pipeline

Output files

Scoring

Web dashboard

Start the backend

Start the frontend

Features

Repository layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Drug Repurposing Through Disease Similarity Analysis Using BiomarkerKB

Overview

Setup

Python environment

LINCS data files (required for LINCS step)

Running the pipeline

Output files

Scoring

Web dashboard

Start the backend

Start the frontend

Features

Repository layout

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages