I'm a Machine Learning Engineer and Data Professional based in Sydney. I build practical ML and data systems that turn messy data into workflows people can inspect, rerun, and use.
Right now my public work is focused on three things:
- Applied AI products: text-to-SQL agents, retrieval workflows, Gemini/Ollama prototypes, and product-shaped AI apps.
- Deployable ML systems: computer vision, audio ML, forecasting, calibrated classifiers, FastAPI services, and Streamlit/React interfaces.
- Analytics and data engineering: Airflow, dbt, Snowflake, Databricks, PySpark, SQL modelling, and portfolio-ready lakehouse projects.
| Project | Why it matters |
|---|---|
| Enterprise Text-to-SQL Agent | Turns natural-language questions into safer local SQL using hybrid schema RAG, Gemini/Ollama generation, SQLGlot validation, SQLite execution, and Streamlit delivery. |
| AI Meal Planner | Combines FastAPI, Streamlit, calorie prediction, local meal retrieval, nutrition checks, feedback capture, and CI-tested backend contracts into a practical planning workflow. |
| FoodLens | Calibrated Food-101 recognition with ResNet50, confidence routing, multi-food crop detection, and a FastAPI + React prototype. |
| Airbnb ELT Warehouse | End-to-end Sydney Airbnb and Census analytics warehouse using Airflow, dbt, PostgreSQL, medallion modelling, and SCD Type 2 snapshots. |
| Bioacoustic Species Classification | BirdCLEF+ audio ML workspace with EfficientNet-B0, Perch v2 probes, reusable artifacts, and CPU-safe inference packaging. |
| NFL Player Contact Detection | Sports ML workflow using tracking features, helmet-derived video probes, temporal smoothing, type-specific models, and LightGBM blending. |
| NYC Taxi Databricks | Databricks lakehouse workflow with PySpark, Spark SQL, Delta Lake curation, trip feature engineering, ridge regression, and segment diagnostics. |
| Solana Price Forecasting | Live forecasting app with Kraken OHLCV ingestion, technical indicators, residual modelling, FastAPI option, tests, and Streamlit delivery. |
I like the part of ML and data work where the model becomes a usable system:
- clear inputs and outputs
- reproducible pipelines
- visible assumptions and failure modes
- diagnostics that explain what changed
- interfaces another person can actually use
That usually means taking a project beyond a notebook: tightening data contracts, adding validation, packaging artifacts, writing the README, and making the result easy to review.
- Portfolio: tuannm3812.github.io
- LinkedIn: linkedin.com/in/tuan-m-nguyen
- GitHub: github.com/tuannm3812
- Kaggle: kaggle.com/tuannm3812
I'm open to conversations about machine learning, data engineering, MLOps, applied AI, Kaggle workflows, and turning rough technical work into something clear enough to trust.

