Skip to content
View shruti-banerjee's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report shruti-banerjee

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
shruti-banerjee/README.md

Shruti Banerjee

Bioinformatics Data Analyst · Python · R · NGS · Transcriptomics

I turn complex biological datasets into clear, reproducible insights. With an MSc in Bioinformatics and hands-on experience in RNA-Seq analysis, statistical modelling, and machine learning, I'm actively seeking data analyst roles at the intersection of biology and computation.


About Me

  • 🎓 MSc Bioinformatics — Pondicherry University (2022–2024)
  • 🎓 BSc Biotechnology (Honours) — Amity University, Kolkata (2019–2022)
  • 📍 Kolkata, India
  • 🔬 Specialisation: Transcriptomics · Cancer Genomics · NGS Data Analysis · Biological Data Mining
  • 💡 Looking for: Bioinformatics Data Analyst roles in industry
  • 📫 banerjee.shruti1306@gmail.com

Core Skills

Bioinformatics Analysis

RNA-Seq    DESeq2    edgeR    HISAT2    SAMtools    Nextflow
TCGA       NCBI      Ensembl  Heatmaps  Volcano Plots

Programming & Data Science

Python     R          SQL
Pandas     NumPy      Scikit-learn     Matplotlib

Tools

Docker     Git        GitHub

Featured Projects


MSc Dissertation · Pondicherry University · 2024

Transcriptomic analysis identifying ferroptosis–apoptosis crosstalk in ovarian cancer from 7,862 differentially expressed genes.

  • Full NGS pipeline: FastQC → HISAT2 → StringTie → DESeq2
  • Built PPI networks using STRING and Cytoscape
  • Identified CDKN1A and GDF15 as key hub genes linked to platinum drug resistance
  • Publication-quality figures in both Python and R

R Python DESeq2 HISAT2 TCGA Ensembl Cytoscape KEGG ggplot2


Bioinformatics Project · 2025

Computational pipeline screening 15+ endophyte-derived natural compounds as potential drug candidates using real biological databases.

  • Fetched molecular data via PubChem REST API
  • Applied Lipinski's Rule of Five using RDKit for drug-likeness evaluation
  • Mined NCBI PubMed with Biopython to track 25-year research trends
  • Built interactive biological network graph (Host Plant → Endophyte → Compound → Disease)
  • Trained Random Forest ML classifier to predict drug-likeness from molecular descriptors

Python RDKit Biopython PubChem API NCBI NetworkX Plotly scikit-learn


NTCC Literature Review · 2022

A data-driven literature review on nanoparticle applications as eco-friendly alternatives to chemical pesticides and fertilisers. Includes Python scripts to visualise key quantitative findings from the paper.

  • Analysed and visualised fungal inhibition data (Ni NP concentrations at 50 ppm & 100 ppm)
  • Charted conventional fertiliser nutrient use efficiency across 6 key nutrients
  • All charts reproducible via included Python script

Python Matplotlib NumPy Data Visualisation


What I Bring to a Data Analyst Role

Skill Area Tools & Experience
Biological data mining TCGA, NCBI, Ensembl — extracting and interpreting large-scale genomic datasets
Statistical analysis DESeq2, edgeR, Scikit-learn — differential expression, ML modelling
Pipeline development Nextflow, Docker — reproducible, automated bioinformatics workflows
Data wrangling Python (Pandas, NumPy), R — cleaning, processing, analysing high-throughput data
Visualisation Matplotlib, R plots — heatmaps, volcano plots, publication-ready figures
Cheminformatics RDKit, PubChem API, NetworkX, Plotly — molecular descriptor analysis, network graphs, ML classification

Certifications

  • 📄 IBM Data Analyst Professional Certificate

GitHub Stats

Shruti's GitHub stats


Let's Connect

Email GitHub


Currently open to bioinformatics data analyst opportunities — feel free to reach out!

Popular repositories Loading

  1. endophyte-drug-discovery-analysis endophyte-drug-discovery-analysis Public

    Bioinformatics data analysis pipeline — mining endophyte-derived bioactive compounds via PubChem API, RDKit molecular descriptors & NCBI PubMed literature trends | Python · Biopython · RDKit

    HTML

  2. IBM_Data-Analysis_Final-Project IBM_Data-Analysis_Final-Project Public

    End-to-end data analysis capstone — technology trends, skills demand, IBM Cognos dashboard, and executive presentation. IBM Data Analyst Certificate (Coursera).

  3. nanotechnology-green-agriculture-review nanotechnology-green-agriculture-review Public

    Python data visualisation of key findings from an NTCC review on nanoparticle applications as eco-friendly alternatives to chemical pesticides and fertilisers

    Python

  4. shruti-banerjee shruti-banerjee Public

    Bioinformatics Data Analyst open to industry roles — specialising in Python, R, RNA-Seq, transcriptomics, and NGS data analysis.

  5. ferroptosis-ovarian-cancer-analysis ferroptosis-ovarian-cancer-analysis Public

    MSc Bioinformatics dissertation — transcriptomic analysis of ferroptosis and apoptosis crosstalk in ovarian cancer using DESeq2, HISAT2, STRING, and Cytoscape. Includes Python and R visualisations.

    Jupyter Notebook