Software Craftsperson · 🤖 AI/ML Engineer · 🏆 4× Kaggle Expert · 🧠 Polyglot Developer/Engineer · Open Source Advocate · ☕ Java Champion · 💬 Speaker
25+ years in the game, working shoulder-to-shoulder with founders, CTOs, and brilliant research teams to craft software that stands the test of time.
Data & AI consultant, mentor, and community builder. Known for making AI accessible — from client delivery to internal upskilling, workshops, and ML career talks.
- 🛡️ Security Champion — internal Threat Modelling upskilling; recognised as "Inspiring Coach" by peers
- 🤖 AI/ML Ambassador — peer-recognised as "Growth Minded" and "Helping Hand" SME; externally as Kaggle Expert, KaggleX BIPOC Mentor, Certified AI Engineer
About Me — Full Story ↓
As a polyglot software developer with 25+ years in the game, I'm all about strengthening teams and helping them accelerate using whatever tech magic we have at our disposal. You'll find me working shoulder-to-shoulder with founders, CTOs, and brilliant research teams, crafting software that not only works beautifully but stands the test of time.
I'm a firm believer in doing things right the first time. Whether it's test-driven development (TDD), behaviour-driven development (BDD), or championing software craftsmanship principles, I bring methodology to the madness. My playground spans the entire Software Development Lifecycle — from reviewing code and breathing new life into legacy systems to test-driving fresh features and diving deep into DevOps.
These days, you'll catch me building specialised tools for AI/ML research teams, wrestling with data engineering challenges, running experiments, and turning raw numbers into meaningful insights. It's like being part scientist, part engineer, and part digital archaeologist!
I've had the privilege of running workshops and hands-on labs (both internal and community-wide), creating courses, and even helping with hiring decisions. There's something incredibly rewarding about watching someone's "aha!" moment when a complex concept finally clicks.
Right now, I'm completely fascinated by AI, Machine Learning, LLMs, and RAG — building smarter pipelines, benchmarking retrieval strategies, and squeezing every drop of performance and quality out of the results.
There was a time when I was equally captivated by Data Analytics with R, stunning Data Visualisations (R & D3), Java Concurrency, and the incredible world of Graal/GraalVM/Truffle. That work shaped a lot of how I think about performance, correctness, and elegant systems — and it lives on in projects like awesome-graal and the Kaggle kernels.
When I'm not coding for work, you'll find me at conferences, workshops, and events, getting my hands dirty with the latest tech. I'm like a kid in a candy store when it comes to hardware accelerators — Movidius chips, FPGAs, Google's Cloud TPUs, GPUs, exotic Python packages. My GitHub repos and blogs are basically my digital laboratory where I document these adventures.
The ultimate goal? Finding better ways to work with higher-quality data, run more elegant experiments, and squeeze every drop of performance and quality from our results. It's not just about writing code — it's about crafting solutions that make a real difference.
|
🏺 Software Craftsperson 2016 |
☕ Java Champion 2018 |
🏅 Oracle Groundbreaker Award 2019 |
🤖 Certified AI Engineer AI Makerspace · 10-week intensive |
🎯 4× Kaggle Expert Competitions · Notebooks · Datasets · Discussions |
12+ years of consistent F/OSS contributions
| Project | What it does | Impact |
|---|---|---|
| rag-params-finder | Systematic RAG parameter sweep tool — evaluates embedding models, chunking strategies, and retrieval methods using MongoDB Atlas Vector Search and Voyage AI | 6 ⭐ · Active |
| pre-rag-explorer-dashboard | Pre-RAG prototype dashboard for document parsing, multi-method chunking, vector embedding generation, and hybrid search exploration — powered by in-browser ML | 5 ⭐ · Active |
| AIE7-Demo-Day-Project (RagCheck) | Proactive RAG corpus quality assessment — analyses document collections before deployment, identifies content gaps, and delivers specific improvement recommendations | 12 ⭐ · Python · TypeScript |
| playgroup_202602_docextract | Multi-LLM benchmark: extracts structured fields from UK charity PDFs (Kleister dataset) across 52 models (OpenRouter + Doubleword Batch API), scored by F1, precision, recall, cost, and time. Key finding: the cheapest Doubleword model (dw-qwen3.5-9b, $0.04/M tokens) hits F1=0.927 — 3rd overall, beating premium models at ~3.6× lower cost. Ships with an interactive 8-tab HTML playground (rankings, field heatmaps, error breakdown, provider analysis) |
Python · HTML · 77 commits |
| microgpt-experiments | Minimal, stdlib-only, dependency-free character-level GPT in pure Python — scalar autograd, multi-head causal attention, Adam with bias correction. Built for learning transformer internals, with a rigorous benchmarking framework: head-count ablation studies (N_HEAD: 1 vs 4), training step sweeps, run reports encoding the full config in the filename, compare_run_reports.py for loss/config diffs between runs, HTML multi-run comparison with ASCII loss graphs and 3-tier semantic quality scoring (real / plausible / nonsense) |
Python · stdlib only |
| awesome-ai-ml-dl | Comprehensive AI/ML/DL study notes & curated resources — dedicated to engineers, data scientists, and researchers worldwide | 1,664 ⭐ · 373 forks |
| nlp_profiler | Drop a Pandas dataframe in, get sentiment, grammar quality, readability, spelling scores and 30+ text features back — like pandas.describe() but for text |
228+ ⭐ · 35 forks · Presented at NLP Zurich 2020 |
More projects ↓
| Project | What it does | Impact |
|---|---|---|
| awesome-graal | The definitive curated resource hub for GraalVM, Truffle, and polyglot JVM — covering Java, Python, R, Ruby, JS, LLVM runtimes | 344+ ⭐ · 30 forks |
| refactoring-developer-habits | Collaborative TDD manifesto — a community-shaped guide to developer habits and the TDD lifecycle, born at SoCraTes UK 2013 | 118+ ⭐ · 32 forks · Presented at LSCC 2016 |
| learning-path-index | Data, assets, and code powering a structured Learning Path Index project | 18 ⭐ · 17 forks |
| RESTAPIUnifier | Brings together APIs of various REST formats under one unified interface | 8 ⭐ · 7 forks · Java |
| better-nlp | NLP library making advanced natural language processing accessible to all | Open source |
| Kaggle: Normalising a distribution | Published research kernel bridging statistical theory and practice | Peer reviewed |
| Kaggle: Limitations of stats measurements | Deep-dive research on the boundaries of statistical measurement | Peer reviewed |
Contributed actively to Adopt OpenJDK and GraalVM up until around 2020/21 — that body of work lives on in the awesome-graal resource hub above.
| Result | Competition | Proof |
|---|---|---|
| 🥇 Top 12% | Liverpool Ion Switching — ML on quantum tunneling data | tweet ↗ |
| 🏆 Team Champion | London "Kaggle Machine Learning Challenger Day" | tweet ↗ |
| 🥇 Top 6 of 50+ | 2019 Kaggle Utility Script Coding Competition | tweet ↗ |
| 🥈 5th of 2,255 | SoftBank Forex Algorithm Data Science Competition 2019/20 | tweet ↗ |
| 🏅 Consolation Prize | Pivigo Data Science Hackathon | tweet ↗ |
Mentored across KaggleX BIPOC cohorts (2022–2025) and AIMakerspace Engineering bootcamp — a true two-way learning experience.
Full mentoring story ↓
I've mentored participants across multiple KaggleX BIPOC Mentorship cohorts (2022–2025), starting with the December 2022 – March 2023 program organised by Kaggle. It's been incredibly rewarding — a true two-way learning experience where I've grown as much as I've helped others. The outcomes and feedback are visible on my LinkedIn and Twitter feeds, showcasing the amazing work these talented individuals have accomplished.
In addition to the above also mentored and been a peer supporter to the students of the AIMakerspace Engineering bootcamp and the onramp courses.
AI/ML · RAG/LLMs · Java · Python · TDD · Open Source
Full tech stack ↓
Languages Java · Python · R · Scala
LLMs & RAG OpenAI · LLM application development · RAG pipelines · vector embeddings
chunking strategies · hybrid search · semantic search · multi-agent systems
Voyage AI · MongoDB Atlas Vector Search · MCP (Model Context Protocol)
microgpt / nanoGPT experimentation
AI / ML TensorFlow · PyTorch · Jupyter · in-browser ML models
data visualisation (R & D3.js) · Kaggle
Backend FastAPI · REST API design & unification · Java services
JVM GraalVM · Truffle · Adopt OpenJDK · JVM Performance Tuning
Java Concurrency · HotSpot JIT analysis
Data R (analytics) · D3.js (visualisation) · data engineering pipelines
document parsing · fact extraction · causal inference
DevOps Docker · CI/CD · Linux · SDLC tooling
Hardware Movidius · FPGAs · Google Cloud TPUs · NVIDIA GPUs
Practices TDD · BDD · Software Craftsmanship · Code Review · Refactoring
"Don't chase success, rather aim for Excellence, and success will come chasing after you!"
Find me talking, writing, or building at one of these:
Twitter · Mastodon · LinkedIn · Medium · Kaggle · GitHub · SlideShare · Lanyrd · Blog · YouTube — Channel · YouTube — Playlists



