π MSc Artificial Intelligence @ Vrije Universiteit Amsterdam π€ Machine Learning Engineer | Evolutionary Robotics | LLM Systems
I build reliable, real-world AI systems at the intersection of machine learning, NLP, and generative AI. My work focuses on bridging research and practical deployment, with experience in LLM pipelines, neuroevolution, computer vision, and structured classification systems.
-
π¬ AI Research & Engineering @ VU CI Group Contributing to the ARIEL evolutionary robotics platform, improving reproducibility, modularity, and usability for research and education. Two conference papers currently under review.
-
πΏ Machine Learning Engineer @ Olivabot Built computer vision + LLM pipelines for agricultural analytics:
- YOLO-based object detection and tree-density estimation from drone imagery
- RAG pipelines for grounded insights and reporting
- FastAPI backend + analytics dashboard
-
π Teaching Assistant @ VU Amsterdam Mentoring students across reinforcement learning, multi-agent systems, and LLM fine-tuning
Languages & Tools
- Python, NumPy, Pandas, Scikit-learn, Git, Docker, SQL, FastAPI
Machine Learning & AI
- PyTorch, HuggingFace Transformers, QLoRA / PEFT (Unsloth), RAG, LLM systems
LLM Engineering
- Prompt engineering, structured outputs (JSON schema, tool_use), multi-model evaluation, Cohen's ΞΊ
Computer Vision & NLP
- YOLO, OpenCV, BERT, RoBERTa, Transformer fine-tuning, cross-domain evaluation
Cloud & MLOps
- AWS, Azure, Google Cloud, GitHub Actions, containerised workflows
An LLM-powered multi-model classification pipeline that classifies ~800 AI tools into a self-designed 3-level hierarchical taxonomy (63 nodes, 8 domains).
- Four models in parallel: Claude Sonnet 4, GPT-4o-mini, Gemini 2.0 Flash, Mistral Small
- Structured JSON output enforced via each model's native API (tool_use, response_schema, JSON mode)
- Inter-model agreement and Cohen's ΞΊ across all model pairs as consistency metrics
- Ambiguity modelled explicitly as a first-class signal β not hidden
- Golden dataset feedback loop for per-model Precision@L3 evaluation
Benchmarks transformer vs classical ML approaches to hate speech detection under distribution shift β training on one platform's data, testing on another.
- Models compared: BERT, RoBERTa, Logistic Regression, SVM, Naive Bayes, Lexicon-Enhanced hybrid
- Two settings: in-domain (OLID β OLID) and cross-domain (HASOC β OLID)
- Key finding: transformers generalise significantly better; classical models degrade 20β30pp cross-domain
- Full evaluation: macro-F1, per-class F1, confusion matrices, performance drop analysis
A modular robotics framework for evolutionary robotics and reinforcement learning (36+ β, 55+ forks). My contributions focus on:
- Developing, maintaining, testing, and documenting the platform
- Ensuring reproducible experiments for research and education
- Extending usability through two web interfaces and student-facing tooling
Research on robust generalisation in neuroevolution:
- Developed generalist agents for continuous control across unseen terrains
- Improved performance vs specialised controllers with reduced compute
- Contributes to ongoing PPSN conference paper on individual-centric evolutionary workflows
πΉ Minesweeper
Classic Windows Minesweeper rebuilt in Python from scratch β using only the documentation. Completed in a few hours as a challenge.
- πΌ LinkedIn
