PhD · Data Scientist · ML Engineer · Biomedical AI
Building ML systems that work on real clinical data — reproducible, containerized, validated at scale.
PhD from LMU Munich in precision psychiatry and neuroimaging. Postdoc coordinating a federated ML network across 11 European clinical sites. Now focused on translating research-grade methods into production-ready data science and ML engineering — across biomedical AI, NLP, and agentic systems.
Most of my research code lives in institutional HPC environments and private clinical repos — what's here represents a growing public portfolio.
| Project | What it does | Stack |
|---|---|---|
| clinicalrag-pipeline | RAG pipeline for querying clinical trial records — deployed on Google Cloud Run | LangChain · Anthropic API · Pinecone · FastAPI |
| ai-channel-playbook | Fully agentic YouTube pipeline: trend research → script → narration → 1080p video → auto-upload, weekly via GitHub Actions | Python · Claude · Apify · FFmpeg |
| Biomedical-ML-Pipeline | End-to-end classification for structured biomedical data: nested CV, SHAP, TabTransformer, XGBoost, imbalance handling | scikit-learn · PyTorch · MLflow |
| nlp-clinical-bert-pipeline | Clinical text classification with fine-tuned BERT/RoBERTa on synthetic GDPR-safe medical notes | HuggingFace · Transformers |
| BMIgapCodeRepo | SVM model predicting BMI deviation in psychiatric populations. Trained on N=1,504, validated on N=559. Published research code | MATLAB · SVM · multi-site |
| ECNP-NNADRrepo | GDPR-compliant federated ETL and QC across 11 EU clinical sites. FAIR metadata, multi-site neuroimaging harmonization. Published ↗ | R · Docker · SPM · CAT12 |
Languages Python · R · SQL · MATLAB · Bash
ML / DL scikit-learn · PyTorch · TensorFlow · XGBoost · LightGBM
NLP / LLM HuggingFace · LangChain · Anthropic API · BERT · RAG · vector DBs
MLOps MLflow · Docker · Singularity · SLURM · Git · GitHub Actions · CI/CD
Data pandas · Airflow · FAIR pipelines · REDCap · DICOM · NIfTI
Imaging FSL · SPM · CAT12 · FreeSurfer · OpenCV
Cloud Google Cloud Run · FastAPI · Pinecone
- N=1,504 training / N=559 validation — BMIgap SVM (MAE 2.75 kg/m², R²=0.28)
- 11 EU clinical sites coordinated in a GDPR-compliant federated neuroimaging network
Reproducible by default — versioned configs, containerized execution, deterministic pipelines
Explainable — SHAP, uncertainty quantification, normative deviation scores
Healthcare-aware — GDPR compliance, FAIR metadata, audit trails
Production-minded — CI/CD, cloud deployment, HPC-ready at scale
Open to DS/ML engineering roles in healthcare AI, biotech, and data-intensive industries.