Atharva Devne atharvadevne123

💫 About Me

👋 Hi, I'm Atharva Devne, a Data and ML Engineer with 4+ years of experience building data systems: automated ETL pipelines, real-time analytics platforms, and production-grade ML APIs with drift detection, explainability, and CI/CD.

📊 I specialize in:

Designing and optimizing complex SQL queries across multi-source databases to consolidate and accelerate reporting
Building automated ETL pipelines that eliminate manual reporting tasks and improve data consistency
Developing executive-level Power BI & Tableau dashboards tracking operational metrics, cost utilization, and service throughput
Applying Python (Pandas, NumPy, scikit-learn) for deep exploratory analysis, ML modeling, and drift detection
Building production ML APIs (FastAPI + Docker) with ensemble models (XGBoost, LightGBM, Random Forest) and SHAP explainability
Implementing RAG pipelines with FAISS vector search for intelligent document retrieval and anomaly explanation
Implementing data validation and quality standards to strengthen downstream reporting reliability
Translating ambiguous business questions into measurable KPIs and data strategies

🎓 MS in Management Information Systems, University of Illinois Chicago | BTech in Computer Science, Maharashtra Institute of Technology, Pune

🔭 What I'm Working On

Enterprise Analytics Platform — Unified E-Commerce, Supply Chain, and Financial Intelligence on Apache Spark, Airflow, Kafka, and FastAPI
SENTINELLA: Real-Time Fraud Detection — XGBoost + LightGBM + RF ensemble with 0.947 AUC-ROC, RAG explainability via FAISS, deployed on AWS
Clinical Trial Cohort Matching System — AI-powered patient-trial matching across 300 patients, 25 trials, and 7,500 match pairs using XGBoost and spaCy

👯 Looking to Collaborate On

Data engineering and analytics projects using Python, Spark, and Airflow
ML solutions for fraud detection, demand forecasting, and customer segmentation
Cloud-based data platforms on AWS, Azure, and GCP

🌱 Currently Learning

Advanced AI and deep learning techniques
Cloud-native data engineering with Databricks and Spark
Real-time data processing and scalable architecture design
MLflow for experiment tracking and model registry

⚡ Fun Fact

I've reduced manual reporting efforts by up to 40% by automating dashboards and analytics workflows — and I built a Python script that automatically organized 14,000+ job emails in Gmail. 🤖

🚀 Featured Projects

Project	Stack	Highlights
Enterprise Analytics Platform	Spark · Airflow · Kafka · FastAPI · PostgreSQL	Unified analytics across E-Commerce, Supply Chain, and Finance
SENTINELLA: Fraud Detection	XGBoost · LightGBM · FAISS · Flask · Docker · AWS	0.947 AUC-ROC ensemble fraud detection with RAG explainability
Clinical Trial Cohort Matching	XGBoost · spaCy · FHIR · FastAPI · PostgreSQL	AI patient-trial matching: 300 patients, 25 trials, 7,500 match pairs
Churn-Shield	LightGBM · SHAP · Airflow · FastAPI	Production churn prediction with drift detection and auto-retraining
Rag-Sentinel	FAISS · Isolation Forest · FastAPI · Docker	Anomaly detection via RAG pipeline with real-time drift monitoring
Price-Prophet	Ensemble ML · FastAPI · Docker	Dynamic price optimization with model monitoring
GenAI-Medical-Policy-Analysis	GenAI · PDF extraction · NLP	Automated medical policy document analysis with natural language Q&A
Gmail-job-Automation	Python · Google Apps Script	Automated inbox organization for 14,000+ job emails

🌐 Connect With Me

💻 Tech Stack

Languages

ML / AI

Frameworks & APIs

Data Engineering

BI & Visualization

Cloud & DevOps

Databases

Tools

📊 GitHub Stats

🏆 Certifications

🎖️ Google Advanced Data Analytics
🎖️ Google Business Intelligence
☁️ AWS Cloud Solutions Architect
📋 Microsoft Project Management

Provide feedback

Saved searches

Use saved searches to filter your results more quickly