Skip to content
View nileshsarkar-ai's full-sized avatar
🌐
I like to train machine learning models and deep neural networks! :D
🌐
I like to train machine learning models and deep neural networks! :D

Highlights

  • Pro

Block or report nileshsarkar-ai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
nileshsarkar-ai/README.md

Nilesh Sarkar

AI Researcher — Knowledge Distillation · Mechanistic Interpretability · World Models

Portfolio Erdős AI Lab LinkedIn Hugging Face Scholar Email

AI Researcher · Erdős AI Lab · 2026–Present B.Tech Artificial Intelligence & Robotics · Dayananda Sagar University · 2023–2027


About

AI Researcher at Erdős AI Lab, working on the boundary between how language models compress information and how that information is mechanistically structured inside the network.

I treat knowledge distillation, sparse autoencoders, pruning, and quantization as analytical probes rather than just engineering optimizations — using compression to surface what models actually represent, and using interpretability to figure out what we can or cannot afford to compress.

Current threads: Knowledge distillation · Sparse autoencoders (SAEs) · World models
                 LLM architecture & compression · Protein structure prediction
                 Medical imaging · Agentic RAG · Indic NLP

Research & Projects

Knowledge Distillation: A Minimum-Width Theorem

First-author · Under review at COLM 2026 · Erdős AI Lab

A theoretical and empirical study on the dictionary width at which a sparse autoencoder's reconstruction loss bottoms out, given a target sparsity. Toy-model validation followed by three real-LM trials on Pythia-410M (24 layers × 6 token checkpoints each).

Checkpoints on Hugging Face:

PyTorch Pythia SAE Mechanistic Interpretability


Mechanistic Interpretability Experiments

Active · Erdős AI Lab

Probing transformer internals with sparse autoencoders, attention-circuit analysis, and feature-attribution studies. Extending to attention-circuit analysis on IOI / induction heads, feature-attribution probes on reasoning benchmarks, and cross-model generalisation (Pythia-410M → 1.4B → 2.8B).

Adjacent thread: a framework for better generalisation on low-sample medical imaging without generative deepfake augmentation.

PyTorch SAEs Probing Activation Patching


Medical AI: PCOS Detection

Published in journal · Erdős AI Lab

"A Systematic Deep Learning Framework for PCOS Detection Using Deduplicated Ultrasound Images: Comparative Analysis of CNN and Vision Transformer Models." A novel three-stage deduplication pipeline (MD5 + perceptual hashing + cross-class removal) cleaned the PCOS-XAI dataset from 11,784 to 3,490 images (70.4% removed). Systematic benchmark of 18 architectures (13 CNNs + 5 ViTs) under identical conditions for 200 epochs.

Top result: EfficientFormer-L1 and MobileViT-Small (hybrid CNN-Transformer) tied at 99.81% test accuracy with AUC up to 1.0. Pure ViT-Base and Swin Transformer Base failed to converge on this dataset size.

Compute: NVIDIA A100 (80 GB VRAM) · 64 GB system RAM · Intel Xeon 42-core CPU.

PyTorch Vision Transformers CNNs Medical Imaging


Protein Folding Experiments

Active · Erdős AI Lab

Structure-prediction studies on small proteins — pLDDT-style confidence calibration, folding-trajectory dynamics (Q, Cα RMSD, R_g, Q–RMSD landscape), and head-to-head comparisons between transformer folding stacks and classical MD baselines.

PyTorch ESMFold Computational Biology


LLM Architecture & Compression

Active · 2025–Present

Compression and deployment experiments across 0.5B–7B parameter models. Teacher–student distillation for Hindi and Kannada low-resource instruction datasets. Deployed a Gemma 3 1B model on NVIDIA Jetson Nano for real-time on-device inference. Now exploring diffusion-based language models for Indic text generation.

QLoRA LoRA Quantization NVIDIA Jetson Indic NLP


Agentic RAG for Safety-Critical Engineering Docs

Industry research · Moog India Technology Centre

Agentic retrieval systems for multi-step reasoning over large structured aerospace engineering corpora. Query-aware routing, citation-grounded retrieval, structured reasoning pipelines. Improved document retrieval accuracy from ~70% to 90%+. Includes an MCP (Model Context Protocol) server for tool integration.

LangChain LangGraph n8n RAG MCP Vector Databases


Autonomous Drone Perception

Active · Sep 2025–Present

Vision-based navigation pipelines for all-terrain UAVs — real-time obstacle detection, monocular depth estimation, and sensor fusion for autonomous flight in unstructured environments.

OpenCV ROS2 Depth Estimation Sensor Fusion


Humanoid Robotic Prosthetic Arm

Active · Jun 2025–Present

Perception-driven servo control and actuation systems for humanoid prosthetic arm prototypes, integrating real-time visual feedback for adaptive grasping.

ROS2 Servo Control Computer Vision Hardware-in-the-Loop


Tech Stack

Research & Modeling

PyTorch Python CUDA TensorFlow Hugging Face

LLM & Agent Systems

LangChain LangGraph n8n MCP RAG

Compression & Interpretability

SAE QLoRA Quantization Distillation

Robotics & Edge

ROS Raspberry Pi NVIDIA

Tools & Infra

Linux Docker Git GCP Azure AWS

Languages

C++ C


Recognition

  • First-author paper under review at COLM 2026 — Knowledge Distillation: A Minimum-Width Theorem (Erdős AI Lab)
  • Published journal paper — A Systematic Deep Learning Framework for PCOS Detection Using Deduplicated Ultrasound Images
  • India AI Impact Summit 2026 — Represented Dayananda Sagar University; presented on LLM architectures, medical AI, and autonomous drones
  • Exceptional Volunteering & Community Service Award — IEEE RAS & CIS (2025)
  • Kaggle Machine Learning Certification (2025)
  • RapidMiner Certified Data Science Professional (2024)

Community

AI Researcher, Erdős AI Lab — Founding research lab focused on knowledge distillation, mechanistic interpretability, and world models. Student-founded, incubated at IIT Bombay.

Co-Founder, RoboVerse Club — Built a 100+ member robotics & AI community at DSU; organized 30+ technical workshops on LLMs, robotics, and edge AI.

Tech Lead, E-Cell DSU — Leading technology initiatives for the university startup ecosystem.

Executive Committee Member, IEEE RAS & IEEE CIS — Organized 5+ technical events and student research programs.


Always open to research collaborations, interesting problems, and good conversations about AI.

Portfolio · Hugging Face · Erdős AI Lab

Pinned Loading

  1. autoresearcher autoresearcher Public

    Forked from karpathy/autoresearch

    autoresearch enhanced :D

    Python 1

  2. Image-Generation-with-Stable-Diffusion-v1.5 Image-Generation-with-Stable-Diffusion-v1.5 Public

    This project uses the Stable Diffusion v1.5 model from RunwayML to generate high-quality images from descriptive text prompts. Built with the diffusers library, it supports GPU acceleration, negati…

    Jupyter Notebook 1

  3. AgenticRAG-Chatbot-with-LangGraph AgenticRAG-Chatbot-with-LangGraph Public

    Agentic RAG chatbot using Streamlit and LangGraph to answer questions from uploaded documents, leveraging confidence scores and web search for comprehensive responses.

    Python

  4. Language-Modelling-and-Agentic-AI-Systems-Curriculum- Language-Modelling-and-Agentic-AI-Systems-Curriculum- Public

    Language Modelling and Agentic AI Systems Curriculum

    Jupyter Notebook

  5. nileshsarkar-ai nileshsarkar-ai Public

    About Me!