Skip to content
View adyasha95's full-sized avatar

Block or report adyasha95

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
adyasha95/README.md

Adyasha Khuntia

PhD · Data Scientist · ML Engineer · Biomedical AI

Building ML systems that work on real clinical data — reproducible, containerized, validated at scale.

LinkedIn Email Website Google Scholar ORCID


PhD from LMU Munich in precision psychiatry and neuroimaging. Postdoc coordinating a federated ML network across 11 European clinical sites. Now focused on translating research-grade methods into production-ready data science and ML engineering — across biomedical AI, NLP, and agentic systems.

Most of my research code lives in institutional HPC environments and private clinical repos — what's here represents a growing public portfolio.


Selected Projects

Project What it does Stack
clinicalrag-pipeline RAG pipeline for querying clinical trial records — deployed on Google Cloud Run LangChain · Anthropic API · Pinecone · FastAPI
ai-channel-playbook Fully agentic YouTube pipeline: trend research → script → narration → 1080p video → auto-upload, weekly via GitHub Actions Python · Claude · Apify · FFmpeg
Biomedical-ML-Pipeline End-to-end classification for structured biomedical data: nested CV, SHAP, TabTransformer, XGBoost, imbalance handling scikit-learn · PyTorch · MLflow
nlp-clinical-bert-pipeline Clinical text classification with fine-tuned BERT/RoBERTa on synthetic GDPR-safe medical notes HuggingFace · Transformers
BMIgapCodeRepo SVM model predicting BMI deviation in psychiatric populations. Trained on N=1,504, validated on N=559. Published research code MATLAB · SVM · multi-site
ECNP-NNADRrepo GDPR-compliant federated ETL and QC across 11 EU clinical sites. FAIR metadata, multi-site neuroimaging harmonization. Published ↗ R · Docker · SPM · CAT12

Tech

Languages       Python · R · SQL · MATLAB · Bash
ML / DL         scikit-learn · PyTorch · TensorFlow · XGBoost · LightGBM
NLP / LLM       HuggingFace · LangChain · Anthropic API · BERT · RAG · vector DBs
MLOps           MLflow · Docker · Singularity · SLURM · Git · GitHub Actions · CI/CD
Data            pandas · Airflow · FAIR pipelines · REDCap · DICOM · NIfTI
Imaging         FSL · SPM · CAT12 · FreeSurfer · OpenCV
Cloud           Google Cloud Run · FastAPI · Pinecone

A Few Numbers

  • N=1,504 training / N=559 validation — BMIgap SVM (MAE 2.75 kg/m², R²=0.28)
  • 11 EU clinical sites coordinated in a GDPR-compliant federated neuroimaging network

Engineering Principles

Reproducible by default — versioned configs, containerized execution, deterministic pipelines
Explainable — SHAP, uncertainty quantification, normative deviation scores
Healthcare-aware — GDPR compliance, FAIR metadata, audit trails
Production-minded — CI/CD, cloud deployment, HPC-ready at scale


Open to DS/ML engineering roles in healthcare AI, biotech, and data-intensive industries.

Pinned Loading

  1. BMIgapCodeRepo BMIgapCodeRepo Public

    Support vector machine model predicting BMI deviation in psychiatric populations. Trained on N=1,504, validated on N=559 (SCZ/CHR/ROD). Multi-site pipeline with clinical measure integration. Publis…

    MATLAB 1

  2. ECNP-NNADRrepo ECNP-NNADRrepo Public

    GDPR-compliant federated data harmonization and QC pipeline across 11 European clinical sites. FAIR metadata, multi-site neuroimaging ETL. Published work

    R

  3. feature-tracking-opencv-demos feature-tracking-opencv-demos Public

    Beginner demos for feature tracking with OpenCV: Shi-Tomasi corner detection and Lucas-Kanade optical flow on synthetic and real images.

    Jupyter Notebook

  4. GAM GAM Public

    R scripts for biostatistics: Kaplan-Meier survival analysis, Cox proportional hazards models, and Generalized Additive Models for biomedical research.

    R

  5. Biomedical-ML-Pipeline Biomedical-ML-Pipeline Public

    Research-grade ML pipeline for structured biomedical data: nested cross-validation, SHAP explainability, TabTransformer, XGBoost, LightGBM, and SMOTE imbalance handling.

    Jupyter Notebook

  6. health-nlp-ner-transformers health-nlp-ner-transformers Public

    Named Entity Recognition pipeline for medical text using BioBERT, ClinicalBERT, and RoBERTa. Hugging Face Trainer API with SHAP explainability.

    Python