Skip to content
View neomatrix369's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@AdoptOpenJDK @MutabilityDetector @Adopt-a-JSR

Block or report neomatrix369

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
neomatrix369/README.md

Mani Sarkar

Software Craftsperson · 🤖 AI/ML Engineer · 🏆 4× Kaggle Expert · 🧠 Polyglot Developer/Engineer · Open Source Advocate · ☕ Java Champion · 💬 Speaker

Twitter Mastodon LinkedIn Kaggle Medium Blog YouTube SlideShare

Profile Views


Supercharging Teams Through Code & Innovation 🚀

25+ years in the game, working shoulder-to-shoulder with founders, CTOs, and brilliant research teams to craft software that stands the test of time.

Data & AI consultant, mentor, and community builder. Known for making AI accessible — from client delivery to internal upskilling, workshops, and ML career talks.

  • 🛡️ Security Champion — internal Threat Modelling upskilling; recognised as "Inspiring Coach" by peers
  • 🤖 AI/ML Ambassador — peer-recognised as "Growth Minded" and "Helping Hand" SME; externally as Kaggle Expert, KaggleX BIPOC Mentor, Certified AI Engineer
About Me — Full Story ↓

As a polyglot software developer with 25+ years in the game, I'm all about strengthening teams and helping them accelerate using whatever tech magic we have at our disposal. You'll find me working shoulder-to-shoulder with founders, CTOs, and brilliant research teams, crafting software that not only works beautifully but stands the test of time.

The Craft Behind the Code

I'm a firm believer in doing things right the first time. Whether it's test-driven development (TDD), behaviour-driven development (BDD), or championing software craftsmanship principles, I bring methodology to the madness. My playground spans the entire Software Development Lifecycle — from reviewing code and breathing new life into legacy systems to test-driving fresh features and diving deep into DevOps.

Where Art Meets Science

These days, you'll catch me building specialised tools for AI/ML research teams, wrestling with data engineering challenges, running experiments, and turning raw numbers into meaningful insights. It's like being part scientist, part engineer, and part digital archaeologist!

Sharing the Knowledge

I've had the privilege of running workshops and hands-on labs (both internal and community-wide), creating courses, and even helping with hiring decisions. There's something incredibly rewarding about watching someone's "aha!" moment when a complex concept finally clicks.

Current Obsessions & Future Frontiers

Right now, I'm completely fascinated by AI, Machine Learning, LLMs, and RAG — building smarter pipelines, benchmarking retrieval strategies, and squeezing every drop of performance and quality out of the results.

Past Obsessions

There was a time when I was equally captivated by Data Analytics with R, stunning Data Visualisations (R & D3), Java Concurrency, and the incredible world of Graal/GraalVM/Truffle. That work shaped a lot of how I think about performance, correctness, and elegant systems — and it lives on in projects like awesome-graal and the Kaggle kernels.

The Eternal Student

When I'm not coding for work, you'll find me at conferences, workshops, and events, getting my hands dirty with the latest tech. I'm like a kid in a candy store when it comes to hardware accelerators — Movidius chips, FPGAs, Google's Cloud TPUs, GPUs, exotic Python packages. My GitHub repos and blogs are basically my digital laboratory where I document these adventures.

The ultimate goal? Finding better ways to work with higher-quality data, run more elegant experiments, and squeeze every drop of performance and quality from our results. It's not just about writing code — it's about crafting solutions that make a real difference.


Recognition & Credentials

🏺
Software Craftsperson
2016

Java Champion
2018
🏅
Oracle Groundbreaker Award
2019
🤖
Certified AI Engineer
AI Makerspace · 10-week intensive
🎯
4× Kaggle Expert
Competitions · Notebooks · Datasets · Discussions

Flagship Open Source Projects 🌟

12+ years of consistent F/OSS contributions

Project What it does Impact
rag-params-finder Systematic RAG parameter sweep tool — evaluates embedding models, chunking strategies, and retrieval methods using MongoDB Atlas Vector Search and Voyage AI 6 ⭐ · Active
pre-rag-explorer-dashboard Pre-RAG prototype dashboard for document parsing, multi-method chunking, vector embedding generation, and hybrid search exploration — powered by in-browser ML 5 ⭐ · Active
AIE7-Demo-Day-Project (RagCheck) Proactive RAG corpus quality assessment — analyses document collections before deployment, identifies content gaps, and delivers specific improvement recommendations 12 ⭐ · Python · TypeScript
playgroup_202602_docextract Multi-LLM benchmark: extracts structured fields from UK charity PDFs (Kleister dataset) across 52 models (OpenRouter + Doubleword Batch API), scored by F1, precision, recall, cost, and time. Key finding: the cheapest Doubleword model (dw-qwen3.5-9b, $0.04/M tokens) hits F1=0.927 — 3rd overall, beating premium models at ~3.6× lower cost. Ships with an interactive 8-tab HTML playground (rankings, field heatmaps, error breakdown, provider analysis) Python · HTML · 77 commits
microgpt-experiments Minimal, stdlib-only, dependency-free character-level GPT in pure Python — scalar autograd, multi-head causal attention, Adam with bias correction. Built for learning transformer internals, with a rigorous benchmarking framework: head-count ablation studies (N_HEAD: 1 vs 4), training step sweeps, run reports encoding the full config in the filename, compare_run_reports.py for loss/config diffs between runs, HTML multi-run comparison with ASCII loss graphs and 3-tier semantic quality scoring (real / plausible / nonsense) Python · stdlib only
awesome-ai-ml-dl Comprehensive AI/ML/DL study notes & curated resources — dedicated to engineers, data scientists, and researchers worldwide 1,664 ⭐ · 373 forks
nlp_profiler Drop a Pandas dataframe in, get sentiment, grammar quality, readability, spelling scores and 30+ text features back — like pandas.describe() but for text 228+ ⭐ · 35 forks · Presented at NLP Zurich 2020
More projects ↓
Project What it does Impact
awesome-graal The definitive curated resource hub for GraalVM, Truffle, and polyglot JVM — covering Java, Python, R, Ruby, JS, LLVM runtimes 344+ ⭐ · 30 forks
refactoring-developer-habits Collaborative TDD manifesto — a community-shaped guide to developer habits and the TDD lifecycle, born at SoCraTes UK 2013 118+ ⭐ · 32 forks · Presented at LSCC 2016
learning-path-index Data, assets, and code powering a structured Learning Path Index project 18 ⭐ · 17 forks
RESTAPIUnifier Brings together APIs of various REST formats under one unified interface 8 ⭐ · 7 forks · Java
better-nlp NLP library making advanced natural language processing accessible to all Open source
Kaggle: Normalising a distribution Published research kernel bridging statistical theory and practice Peer reviewed
Kaggle: Limitations of stats measurements Deep-dive research on the boundaries of statistical measurement Peer reviewed

Contributed actively to Adopt OpenJDK and GraalVM up until around 2020/21 — that body of work lives on in the awesome-graal resource hub above.


Competition Record 🏆

Result Competition Proof
🥇 Top 12% Liverpool Ion Switching — ML on quantum tunneling data tweet ↗
🏆 Team Champion London "Kaggle Machine Learning Challenger Day" tweet ↗
🥇 Top 6 of 50+ 2019 Kaggle Utility Script Coding Competition tweet ↗
🥈 5th of 2,255 SoftBank Forex Algorithm Data Science Competition 2019/20 tweet ↗
🏅 Consolation Prize Pivigo Data Science Hackathon tweet ↗

Mentoring & Community Impact

Mentored across KaggleX BIPOC cohorts (2022–2025) and AIMakerspace Engineering bootcamp — a true two-way learning experience.

Full mentoring story ↓

I've mentored participants across multiple KaggleX BIPOC Mentorship cohorts (2022–2025), starting with the December 2022 – March 2023 program organised by Kaggle. It's been incredibly rewarding — a true two-way learning experience where I've grown as much as I've helped others. The outcomes and feedback are visible on my LinkedIn and Twitter feeds, showcasing the amazing work these talented individuals have accomplished.

In addition to the above also mentored and been a peer supporter to the students of the AIMakerspace Engineering bootcamp and the onramp courses.


GitHub Stats

Mani's GitHub Stats   Top Languages

GitHub Streak


What I Work With

AI/ML · RAG/LLMs · Java · Python · TDD · Open Source

Full tech stack ↓
Languages       Java · Python · R · Scala

LLMs & RAG      OpenAI · LLM application development · RAG pipelines · vector embeddings
                chunking strategies · hybrid search · semantic search · multi-agent systems
                Voyage AI · MongoDB Atlas Vector Search · MCP (Model Context Protocol)
                microgpt / nanoGPT experimentation

AI / ML         TensorFlow · PyTorch · Jupyter · in-browser ML models
                data visualisation (R & D3.js) · Kaggle

Backend         FastAPI · REST API design & unification · Java services

JVM             GraalVM · Truffle · Adopt OpenJDK · JVM Performance Tuning
                Java Concurrency · HotSpot JIT analysis

Data            R (analytics) · D3.js (visualisation) · data engineering pipelines
                document parsing · fact extraction · causal inference

DevOps          Docker · CI/CD · Linux · SDLC tooling

Hardware        Movidius · FPGAs · Google Cloud TPUs · NVIDIA GPUs

Practices       TDD · BDD · Software Craftsmanship · Code Review · Refactoring

"Don't chase success, rather aim for Excellence, and success will come chasing after you!"


Find me talking, writing, or building at one of these:

Twitter · Mastodon · LinkedIn · Medium · Kaggle · GitHub · SlideShare · Lanyrd · Blog · YouTube — Channel · YouTube — Playlists

Pinned Loading

  1. rag-params-finder rag-params-finder Public

    RAG parameter sweep experimentation tool — systematically evaluate embedding models, chunking strategies, and retrieval methods using MongoDB Atlas Vector Search. Supports both Voyage AI (hosted) a…

    Python 10 3

  2. awesome-ai-ml-dl awesome-ai-ml-dl Public

    Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.

    Jupyter Notebook 1.7k 373

  3. Interesting links in the areas of HP... Interesting links in the areas of HPC, low latency, mechanical harmony/sympathy, garbage collection
    1
    ### Everything I Ever Learned About JVM Performance Tuning @Twitter- by Attila Szegedi
    2
    http://www.infoq.com/presentations/JVM-Performance-Tuning-twitter (video & slides)
    3
      
    4
    ### 9 Fallacies of Java Performance - by Ben Evans
    5
    http://www.infoq.com/articles/9_Fallacies_Java_Performance (video & slides)
  4. Know your GPUs Know your GPUs
    1
    ### Commands on Linux to gather information related to GPUs
    2
    
                  
    3
    Below is a list of commands and resources that work on Linux (some need installation of packages), would love to welcome contributions for the same for MacOSX and Windows platforms too. Please share and contribute back.
    4
    
                  
    5
    Please run the below in both `vagrant`, `docker` and other container environments and share your experiences with us!
  5. nlp_profiler nlp_profiler Public

    A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low…

    Python 244 37

  6. refactoring-developer-habits refactoring-developer-habits Public

    Refactor developer habits: among many such habits when writing or maintaining code

    141 32