mani neomatrix369

Mani Sarkar

Software Craftsperson · 🤖 AI/ML Engineer · 🏆 4× Kaggle Expert · 🧠 Polyglot Developer/Engineer · Open Source Advocate · ☕ Java Champion · 💬 Speaker

Supercharging Teams Through Code & Innovation 🚀

25+ years in the game, working shoulder-to-shoulder with founders, CTOs, and brilliant research teams to craft software that stands the test of time.

Data & AI consultant, mentor, and community builder. Known for making AI accessible — from client delivery to internal upskilling, workshops, and ML career talks.

🛡️ Security Champion — internal Threat Modelling upskilling; recognised as "Inspiring Coach" by peers
🤖 AI/ML Ambassador — peer-recognised as "Growth Minded" and "Helping Hand" SME; externally as Kaggle Expert, KaggleX BIPOC Mentor, Certified AI Engineer

About Me — Full Story ↓

As a polyglot software developer with 25+ years in the game, I'm all about strengthening teams and helping them accelerate using whatever tech magic we have at our disposal. You'll find me working shoulder-to-shoulder with founders, CTOs, and brilliant research teams, crafting software that not only works beautifully but stands the test of time.

The Craft Behind the Code

I'm a firm believer in doing things right the first time. Whether it's test-driven development (TDD), behaviour-driven development (BDD), or championing software craftsmanship principles, I bring methodology to the madness. My playground spans the entire Software Development Lifecycle — from reviewing code and breathing new life into legacy systems to test-driving fresh features and diving deep into DevOps.

Where Art Meets Science

These days, you'll catch me building specialised tools for AI/ML research teams, wrestling with data engineering challenges, running experiments, and turning raw numbers into meaningful insights. It's like being part scientist, part engineer, and part digital archaeologist!

Sharing the Knowledge

I've had the privilege of running workshops and hands-on labs (both internal and community-wide), creating courses, and even helping with hiring decisions. There's something incredibly rewarding about watching someone's "aha!" moment when a complex concept finally clicks.

Current Obsessions & Future Frontiers

Right now, I'm completely fascinated by AI, Machine Learning, LLMs, and RAG — building smarter pipelines, benchmarking retrieval strategies, and squeezing every drop of performance and quality out of the results.

Past Obsessions

There was a time when I was equally captivated by Data Analytics with R, stunning Data Visualisations (R & D3), Java Concurrency, and the incredible world of Graal/GraalVM/Truffle. That work shaped a lot of how I think about performance, correctness, and elegant systems — and it lives on in projects like awesome-graal and the Kaggle kernels.

The Eternal Student

When I'm not coding for work, you'll find me at conferences, workshops, and events, getting my hands dirty with the latest tech. I'm like a kid in a candy store when it comes to hardware accelerators — Movidius chips, FPGAs, Google's Cloud TPUs, GPUs, exotic Python packages. My GitHub repos and blogs are basically my digital laboratory where I document these adventures.

The ultimate goal? Finding better ways to work with higher-quality data, run more elegant experiments, and squeeze every drop of performance and quality from our results. It's not just about writing code — it's about crafting solutions that make a real difference.

Recognition & Credentials

🏺
Software Craftsperson
₂₀₁₆

☕
Java Champion
₂₀₁₈

🏅
Oracle Groundbreaker Award
₂₀₁₉

🤖
Certified AI Engineer
_{AI Makerspace · 10-week intensive}

🎯
4× Kaggle Expert
_{Competitions · Notebooks · Datasets · Discussions}

Flagship Open Source Projects 🌟

12+ years of consistent F/OSS contributions

Project	What it does	Impact
rag-params-finder	Systematic RAG parameter sweep tool — evaluates embedding models, chunking strategies, and retrieval methods using MongoDB Atlas Vector Search and Voyage AI	6 ⭐ · Active
pre-rag-explorer-dashboard	Pre-RAG prototype dashboard for document parsing, multi-method chunking, vector embedding generation, and hybrid search exploration — powered by in-browser ML	5 ⭐ · Active
AIE7-Demo-Day-Project (RagCheck)	Proactive RAG corpus quality assessment — analyses document collections before deployment, identifies content gaps, and delivers specific improvement recommendations	12 ⭐ · Python · TypeScript
playgroup_202602_docextract	Multi-LLM benchmark: extracts structured fields from UK charity PDFs (Kleister dataset) across 52 models (OpenRouter + Doubleword Batch API), scored by F1, precision, recall, cost, and time. Key finding: the cheapest Doubleword model (`dw-qwen3.5-9b`, $0.04/M tokens) hits F1=0.927 — 3rd overall, beating premium models at ~3.6× lower cost. Ships with an interactive 8-tab HTML playground (rankings, field heatmaps, error breakdown, provider analysis)	Python · HTML · 77 commits
microgpt-experiments	Minimal, stdlib-only, dependency-free character-level GPT in pure Python — scalar autograd, multi-head causal attention, Adam with bias correction. Built for learning transformer internals, with a rigorous benchmarking framework: head-count ablation studies (N_HEAD: 1 vs 4), training step sweeps, run reports encoding the full config in the filename, `compare_run_reports.py` for loss/config diffs between runs, HTML multi-run comparison with ASCII loss graphs and 3-tier semantic quality scoring (real / plausible / nonsense)	Python · stdlib only
awesome-ai-ml-dl	Comprehensive AI/ML/DL study notes & curated resources — dedicated to engineers, data scientists, and researchers worldwide	1,664 ⭐ · 373 forks
nlp_profiler	Drop a Pandas dataframe in, get sentiment, grammar quality, readability, spelling scores and 30+ text features back — like `pandas.describe()` but for text	228+ ⭐ · 35 forks · Presented at NLP Zurich 2020

More projects ↓

Project	What it does	Impact
awesome-graal	The definitive curated resource hub for GraalVM, Truffle, and polyglot JVM — covering Java, Python, R, Ruby, JS, LLVM runtimes	344+ ⭐ · 30 forks
refactoring-developer-habits	Collaborative TDD manifesto — a community-shaped guide to developer habits and the TDD lifecycle, born at SoCraTes UK 2013	118+ ⭐ · 32 forks · Presented at LSCC 2016
learning-path-index	Data, assets, and code powering a structured Learning Path Index project	18 ⭐ · 17 forks
RESTAPIUnifier	Brings together APIs of various REST formats under one unified interface	8 ⭐ · 7 forks · Java
better-nlp	NLP library making advanced natural language processing accessible to all	Open source
Kaggle: Normalising a distribution	Published research kernel bridging statistical theory and practice	Peer reviewed
Kaggle: Limitations of stats measurements	Deep-dive research on the boundaries of statistical measurement	Peer reviewed

Contributed actively to Adopt OpenJDK and GraalVM up until around 2020/21 — that body of work lives on in the awesome-graal resource hub above.

Competition Record 🏆

Result	Competition	Proof
🥇 Top 12%	Liverpool Ion Switching — ML on quantum tunneling data	tweet ↗
🏆 Team Champion	London "Kaggle Machine Learning Challenger Day"	tweet ↗
🥇 Top 6 of 50+	2019 Kaggle Utility Script Coding Competition	tweet ↗
🥈 5th of 2,255	SoftBank Forex Algorithm Data Science Competition 2019/20	tweet ↗
🏅 Consolation Prize	Pivigo Data Science Hackathon	tweet ↗

Mentoring & Community Impact

Mentored across KaggleX BIPOC cohorts (2022–2025) and AIMakerspace Engineering bootcamp — a true two-way learning experience.

Full mentoring story ↓

I've mentored participants across multiple KaggleX BIPOC Mentorship cohorts (2022–2025), starting with the December 2022 – March 2023 program organised by Kaggle. It's been incredibly rewarding — a true two-way learning experience where I've grown as much as I've helped others. The outcomes and feedback are visible on my LinkedIn and Twitter feeds, showcasing the amazing work these talented individuals have accomplished.

In addition to the above also mentored and been a peer supporter to the students of the AIMakerspace Engineering bootcamp and the onramp courses.

GitHub Stats

What I Work With

AI/ML · RAG/LLMs · Java · Python · TDD · Open Source

Full tech stack ↓

Languages       Java · Python · R · Scala

LLMs & RAG      OpenAI · LLM application development · RAG pipelines · vector embeddings
                chunking strategies · hybrid search · semantic search · multi-agent systems
                Voyage AI · MongoDB Atlas Vector Search · MCP (Model Context Protocol)
                microgpt / nanoGPT experimentation

AI / ML         TensorFlow · PyTorch · Jupyter · in-browser ML models
                data visualisation (R & D3.js) · Kaggle

Backend         FastAPI · REST API design & unification · Java services

JVM             GraalVM · Truffle · Adopt OpenJDK · JVM Performance Tuning
                Java Concurrency · HotSpot JIT analysis

Data            R (analytics) · D3.js (visualisation) · data engineering pipelines
                document parsing · fact extraction · causal inference

DevOps          Docker · CI/CD · Linux · SDLC tooling

Hardware        Movidius · FPGAs · Google Cloud TPUs · NVIDIA GPUs

Practices       TDD · BDD · Software Craftsmanship · Code Review · Refactoring

"Don't chase success, rather aim for Excellence, and success will come chasing after you!"

Find me talking, writing, or building at one of these:

Twitter · Mastodon · LinkedIn · Medium · Kaggle · GitHub · SlideShare · Lanyrd · Blog · YouTube — Channel · YouTube — Playlists

Provide feedback

Saved searches

Use saved searches to filter your results more quickly