👋 Hi, I'm Atharva Devne, a Data and ML Engineer with 4+ years of experience building data systems: automated ETL pipelines, real-time analytics platforms, and production-grade ML APIs with drift detection, explainability, and CI/CD.
📊 I specialize in:
- Designing and optimizing complex SQL queries across multi-source databases to consolidate and accelerate reporting
- Building automated ETL pipelines that eliminate manual reporting tasks and improve data consistency
- Developing executive-level Power BI & Tableau dashboards tracking operational metrics, cost utilization, and service throughput
- Applying Python (Pandas, NumPy, scikit-learn) for deep exploratory analysis, ML modeling, and drift detection
- Building production ML APIs (FastAPI + Docker) with ensemble models (XGBoost, LightGBM, Random Forest) and SHAP explainability
- Implementing RAG pipelines with FAISS vector search for intelligent document retrieval and anomaly explanation
- Implementing data validation and quality standards to strengthen downstream reporting reliability
- Translating ambiguous business questions into measurable KPIs and data strategies
🎓 MS in Management Information Systems, University of Illinois Chicago | BTech in Computer Science, Maharashtra Institute of Technology, Pune
- Enterprise Analytics Platform — Unified E-Commerce, Supply Chain, and Financial Intelligence on Apache Spark, Airflow, Kafka, and FastAPI
- SENTINELLA: Real-Time Fraud Detection — XGBoost + LightGBM + RF ensemble with 0.947 AUC-ROC, RAG explainability via FAISS, deployed on AWS
- Clinical Trial Cohort Matching System — AI-powered patient-trial matching across 300 patients, 25 trials, and 7,500 match pairs using XGBoost and spaCy
- Data engineering and analytics projects using Python, Spark, and Airflow
- ML solutions for fraud detection, demand forecasting, and customer segmentation
- Cloud-based data platforms on AWS, Azure, and GCP
- Advanced AI and deep learning techniques
- Cloud-native data engineering with Databricks and Spark
- Real-time data processing and scalable architecture design
- MLflow for experiment tracking and model registry
I've reduced manual reporting efforts by up to 40% by automating dashboards and analytics workflows — and I built a Python script that automatically organized 14,000+ job emails in Gmail. 🤖
| Project | Stack | Highlights |
|---|---|---|
| Enterprise Analytics Platform | Spark · Airflow · Kafka · FastAPI · PostgreSQL | Unified analytics across E-Commerce, Supply Chain, and Finance |
| SENTINELLA: Fraud Detection | XGBoost · LightGBM · FAISS · Flask · Docker · AWS | 0.947 AUC-ROC ensemble fraud detection with RAG explainability |
| Clinical Trial Cohort Matching | XGBoost · spaCy · FHIR · FastAPI · PostgreSQL | AI patient-trial matching: 300 patients, 25 trials, 7,500 match pairs |
| Churn-Shield | LightGBM · SHAP · Airflow · FastAPI | Production churn prediction with drift detection and auto-retraining |
| Rag-Sentinel | FAISS · Isolation Forest · FastAPI · Docker | Anomaly detection via RAG pipeline with real-time drift monitoring |
| Price-Prophet | Ensemble ML · FastAPI · Docker | Dynamic price optimization with model monitoring |
| GenAI-Medical-Policy-Analysis | GenAI · PDF extraction · NLP | Automated medical policy document analysis with natural language Q&A |
| Gmail-job-Automation | Python · Google Apps Script | Automated inbox organization for 14,000+ job emails |
Languages
ML / AI
Frameworks & APIs
Data Engineering
BI & Visualization
Cloud & DevOps
Databases
Tools
- 🎖️ Google Advanced Data Analytics
- 🎖️ Google Business Intelligence
- ☁️ AWS Cloud Solutions Architect
- 📋 Microsoft Project Management
