shruti-banerjee

Shruti Banerjee

Bioinformatics Data Analyst · Python · R · NGS · Transcriptomics

I turn complex biological datasets into clear, reproducible insights. With an MSc in Bioinformatics and hands-on experience in RNA-Seq analysis, statistical modelling, and machine learning, I'm actively seeking data analyst roles at the intersection of biology and computation.

About Me

🎓 MSc Bioinformatics — Pondicherry University (2022–2024)
🎓 BSc Biotechnology (Honours) — Amity University, Kolkata (2019–2022)
📍 Kolkata, India
🔬 Specialisation: Transcriptomics · Cancer Genomics · NGS Data Analysis · Biological Data Mining
💡 Looking for: Bioinformatics Data Analyst roles in industry
📫 banerjee.shruti1306@gmail.com

Core Skills

Bioinformatics Analysis

RNA-Seq    DESeq2    edgeR    HISAT2    SAMtools    Nextflow
TCGA       NCBI      Ensembl  Heatmaps  Volcano Plots

Programming & Data Science

Python     R          SQL
Pandas     NumPy      Scikit-learn     Matplotlib

Tools

Docker     Git        GitHub

Featured Projects

🧬 Ferroptosis Hub Genes & Apoptosis in Ovarian Cancer

MSc Dissertation · Pondicherry University · 2024

Transcriptomic analysis identifying ferroptosis–apoptosis crosstalk in ovarian cancer from 7,862 differentially expressed genes.

Full NGS pipeline: FastQC → HISAT2 → StringTie → DESeq2
Built PPI networks using STRING and Cytoscape
Identified CDKN1A and GDF15 as key hub genes linked to platinum drug resistance
Publication-quality figures in both Python and R

R Python DESeq2 HISAT2 TCGA Ensembl Cytoscape KEGG ggplot2

💊 Endophyte Drug Discovery — Cheminformatics Pipeline

Bioinformatics Project · 2025

Computational pipeline screening 15+ endophyte-derived natural compounds as potential drug candidates using real biological databases.

Fetched molecular data via PubChem REST API
Applied Lipinski's Rule of Five using RDKit for drug-likeness evaluation
Mined NCBI PubMed with Biopython to track 25-year research trends
Built interactive biological network graph (Host Plant → Endophyte → Compound → Disease)
Trained Random Forest ML classifier to predict drug-likeness from molecular descriptors

Python RDKit Biopython PubChem API NCBI NetworkX Plotly scikit-learn

🌱 Nanoparticles in Green Pesticides & Sustainable Agriculture

NTCC Literature Review · 2022

A data-driven literature review on nanoparticle applications as eco-friendly alternatives to chemical pesticides and fertilisers. Includes Python scripts to visualise key quantitative findings from the paper.

Analysed and visualised fungal inhibition data (Ni NP concentrations at 50 ppm & 100 ppm)
Charted conventional fertiliser nutrient use efficiency across 6 key nutrients
All charts reproducible via included Python script

Python Matplotlib NumPy Data Visualisation

What I Bring to a Data Analyst Role

Skill Area	Tools & Experience
Biological data mining	TCGA, NCBI, Ensembl — extracting and interpreting large-scale genomic datasets
Statistical analysis	DESeq2, edgeR, Scikit-learn — differential expression, ML modelling
Pipeline development	Nextflow, Docker — reproducible, automated bioinformatics workflows
Data wrangling	Python (Pandas, NumPy), R — cleaning, processing, analysing high-throughput data
Visualisation	Matplotlib, R plots — heatmaps, volcano plots, publication-ready figures
Cheminformatics	RDKit, PubChem API, NetworkX, Plotly — molecular descriptor analysis, network graphs, ML classification

Certifications

📄 IBM Data Analyst Professional Certificate

GitHub Stats

Let's Connect

Currently open to bioinformatics data analyst opportunities — feel free to reach out!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly