I am a statistician and data scientist with expertise in statistical modeling, machine learning, Bayesian methods, and health data analytics. I recently completed my PhD in Mathematical Sciences with a Statistics focus at Northern Illinois University, where my research focused on multivariate longitudinal data and Bayesian functional factor models.
My work combines rigorous statistical methods, reproducible data science workflows, and clear communication to solve real-world problems across healthcare, public health, business analytics, and applied research.
- Bayesian modeling and inference
- Functional data analysis
- Factor models and dimension reduction
- Longitudinal and multilevel models
- Biostatistics and health disparities
- Machine learning methods
- Python
- R
- SQL
- SAS
- SPSS
- Git and GitHub
- Jupyter Notebook
- R Markdown
- LaTeX
- Data cleaning
- Data wrangling
- Exploratory data analysis
- Feature engineering
- Reproducible reporting
- Statistical visualization
- Tableau
- Power BI
- matplotlib
- ggplot2
- Linear regression
- Logistic regression
- Poisson regression
- Negative binomial regression
- Mixed models
- Survival analysis
- Longitudinal data analysis
- Multivariate analysis
- Bayesian modeling
- Principal component analysis
- Factor analysis
- Train and test split
- Cross validation
- Classification model evaluation
- Logistic regression
- Random forest
- XGBoost
- AUC
- Accuracy
- Precision
- Recall
- Calibration
- Feature importance
- SHAP interpretation
- Health data analytics
- Public health research
- Social determinants of health
- Biostatistics
- Statistical consulting
- Business analytics
- Predictive modeling
A reproducible supervised machine learning project for diabetes prediction using Python. This project demonstrates the full data science workflow, including data cleaning, exploratory data analysis, feature engineering, train/test split, preprocessing, logistic regression, random forest, XGBoost, cross-validation, model evaluation, calibration, feature importance, SHAP interpretation, and visualization.
Repository: end-to-end-machine-learning-pipeline
Bayesian modeling for extracting latent signals from high dimensional time series data. This project connects Bayesian factor models, functional data analysis, smoothing, and temporal latent structure interpretation.
Repository: bayesian-functional-time-series
Statistical modeling using large scale health data to study pain outcomes, perceived healthcare discrimination, prescription drug use patterns, and health disparities. Methods include data cleaning, regression modeling, negative binomial models, subgroup analysis, reproducible reporting, and communication of findings to both technical and nontechnical audiences.
Applied statistical consulting experience across medical, educational, transportation, and public health projects. Responsibilities include client communication, data management, statistical modeling, visualization, report writing, and translating technical results for nontechnical audiences.
- Google Scholar: https://scholar.google.com/citations?user=otxnmaQAAAAJ&hl=zh-CN
- LinkedIn: https://www.linkedin.com/in/boshi-zhao-7267b1222
- GitHub Pages website: https://boshi19920920-tech.github.io/Boshi-Zhao.github.io/
Email: boshi19920920@gmail.com
Feel free to reach out if you would like to collaborate or connect.
β Thank you for visiting my GitHub profile. More updates coming soon.