I am a data scientist and engineer with a Master of Data Science from the University of Virginia (UVA MSDS ’25).
I build practical, reliable, and scalable data systems—combining machine learning, cloud tools, automation patterns, and modern AI techniques.
My recent work includes a year-long collaboration with the U.S. Census Bureau, developing a large-scale metadata extraction and document-classification pipeline to improve public data usability.
I bring a unique combination of:
- Strong technical execution
- Real project delivery experience
- Professional discipline from 20+ years in financial services
- High energy, curiosity, and continuous learning
I am open to roles in Data Science, Machine Learning, AI Engineering, Data Engineering, Cloud/DataOps, and GovTech.
Tech: Python · Web Crawling · Metadata Extraction · Regex · NLP · LLMs · Automation
- Designed a recursive web crawler (10-level depth) to map and analyze 3,532 Census Bureau pages.
- Extracted metadata from PDFs, CSVs, XLS/XLSX files using parsing and classification pipelines.
- Built a hybrid rule-based + LLM classification workflow.
- Delivered structured datasets and documentation to Census research leads.
Repository: (will be linked to your capstone repo)
Tech: Python · TensorFlow · Keras · Time-Series Modeling · Explainability
- Modeled patient sequences using LSTM for mortality-risk prediction after cardiac surgery (MIMIC-IV).
- Achieved high accuracy (~98.6%) on LSTM, outperforming baseline RNN/GRU.
- Improving the project with SHAP explainability.
- Rebuilding the pipeline for cleaner engineering and GitHub readiness.
Repository: (link when ready)
Tech: Stan · CmdStanPy · MCMC · Variational Inference
- Implemented stochastic volatility and flight price Bayesian models.
- Conducted posterior diagnostics, trace plots, and correlation analysis.
- Compared MCMC vs. Variational Inference behaviors.
Repository: (add link when ready)
Tech: CNN · Text Classification · Generative Models (DCGAN)
- Built image classification models (CNN).
- Implemented sentiment analysis on IMDB reviews (Bidirectional LSTM, ~86% accuracy).
- Developed generative models using DCGAN.
- Practiced modular deep learning across data, modeling, and evaluation.
Repository: (add links to codeathon repos)
Python · R · SQL · Bash · Git/GitHub · Linux · Jupyter · VS Code · Makefile · GitHub Actions
Neural Networks · CNN · LSTM · GRU · GAN · scikit-learn · TensorFlow · Keras
NLP pipelines · spaCy · Regex · Prompt Engineering · LLM-assisted classification
Generative AI · Explainability (SHAP)
Bayesian inference (Stan, CmdStanPy)
MCMC · Variational Inference
Regression · Probability · Hypothesis Testing · Time Series
Web scraping · BeautifulSoup · Requests
Large-scale crawling (DFS/BFS hybrid)
Metadata extraction (PDF/CSV/Excel)
Cron jobs · Automation patterns · Prototype AWS S3 workflows
- Strengthening MLOps and cloud deployment fundamentals.
- Enhancing the ICU LSTM model with SHAP explainability.
- Expanding the Census metadata system into a more automated pipeline.
- Building clean, recruiter-ready repositories for my portfolio.
- Email: 📧
chiqvdo@gmail.com - GitHub: 🔗 https://github.com/ChiQuynhDo