I turn complex biological datasets into clear, reproducible insights. With an MSc in Bioinformatics and hands-on experience in RNA-Seq analysis, statistical modelling, and machine learning, I'm actively seeking data analyst roles at the intersection of biology and computation.
- 🎓 MSc Bioinformatics — Pondicherry University (2022–2024)
- 🎓 BSc Biotechnology (Honours) — Amity University, Kolkata (2019–2022)
- 📍 Kolkata, India
- 🔬 Specialisation: Transcriptomics · Cancer Genomics · NGS Data Analysis · Biological Data Mining
- 💡 Looking for: Bioinformatics Data Analyst roles in industry
- 📫 banerjee.shruti1306@gmail.com
Bioinformatics Analysis
RNA-Seq DESeq2 edgeR HISAT2 SAMtools Nextflow
TCGA NCBI Ensembl Heatmaps Volcano Plots
Programming & Data Science
Python R SQL
Pandas NumPy Scikit-learn Matplotlib
Tools
Docker Git GitHub
MSc Dissertation · Pondicherry University · 2024
Transcriptomic analysis identifying ferroptosis–apoptosis crosstalk in ovarian cancer from 7,862 differentially expressed genes.
- Full NGS pipeline: FastQC → HISAT2 → StringTie → DESeq2
- Built PPI networks using STRING and Cytoscape
- Identified CDKN1A and GDF15 as key hub genes linked to platinum drug resistance
- Publication-quality figures in both Python and R
R Python DESeq2 HISAT2 TCGA Ensembl Cytoscape KEGG ggplot2
Bioinformatics Project · 2025
Computational pipeline screening 15+ endophyte-derived natural compounds as potential drug candidates using real biological databases.
- Fetched molecular data via PubChem REST API
- Applied Lipinski's Rule of Five using RDKit for drug-likeness evaluation
- Mined NCBI PubMed with Biopython to track 25-year research trends
- Built interactive biological network graph (Host Plant → Endophyte → Compound → Disease)
- Trained Random Forest ML classifier to predict drug-likeness from molecular descriptors
Python RDKit Biopython PubChem API NCBI NetworkX Plotly scikit-learn
NTCC Literature Review · 2022
A data-driven literature review on nanoparticle applications as eco-friendly alternatives to chemical pesticides and fertilisers. Includes Python scripts to visualise key quantitative findings from the paper.
- Analysed and visualised fungal inhibition data (Ni NP concentrations at 50 ppm & 100 ppm)
- Charted conventional fertiliser nutrient use efficiency across 6 key nutrients
- All charts reproducible via included Python script
Python Matplotlib NumPy Data Visualisation
| Skill Area | Tools & Experience |
|---|---|
| Biological data mining | TCGA, NCBI, Ensembl — extracting and interpreting large-scale genomic datasets |
| Statistical analysis | DESeq2, edgeR, Scikit-learn — differential expression, ML modelling |
| Pipeline development | Nextflow, Docker — reproducible, automated bioinformatics workflows |
| Data wrangling | Python (Pandas, NumPy), R — cleaning, processing, analysing high-throughput data |
| Visualisation | Matplotlib, R plots — heatmaps, volcano plots, publication-ready figures |
| Cheminformatics | RDKit, PubChem API, NetworkX, Plotly — molecular descriptor analysis, network graphs, ML classification |
- 📄 IBM Data Analyst Professional Certificate
Currently open to bioinformatics data analyst opportunities — feel free to reach out!