I'm a Data Science professional with hands-on, end-to-end experience across Data Analytics, Machine Learning, Deep Learning, and Generative AI — from cleaning raw data to deploying production-ready, AI-powered systems.
My core strength lies in turning messy, real-world data into clear insights and reliable predictive models, while engineering the scalable ETL pipelines and APIs needed to put those models into production.
- 🔭 Currently building Machine Learning, Deep Learning & Generative AI applications
- 📊 Skilled in Data Analytics, Exploratory Data Analysis (EDA), and Feature Engineering
- 🌱 Deepening my expertise in Advanced NLP, LLMs, AI Agents, and MLOps
- 🧠 Comfortable across the full ML lifecycle: data → features → model → deployment → monitoring
- 🚀 Experienced in building scalable backend APIs and AI-powered microservices
- 💡 Particularly interested in Financial Analytics, Healthcare AI, and Intelligent Automation
- 👨💼 Open to Machine Learning Engineer, Data Scientist, Data Analyst, and AI Engineer roles
📫 Reach me at yashraj07thube.tech@gmail.com
Languages: Python SQL
Data Analytics & Visualization: Pandas NumPy Matplotlib Seaborn Exploratory Data Analysis (EDA)
Machine Learning: Scikit-learn XGBoost Feature Engineering Predictive Modeling Cross Validation TensorFlow
Deep Learning & Generative AI: Neural Networks LSTM NLP Generative AI Google Gemini API Explainable AI (SHAP)
Backend Development: FastAPI Flask React Streamlit Microservices Async Processing
Databases: MySQL SQLite
Data Engineering: ETL Pipelines Data Cleaning Data Preprocessing Data Transformation
Tools & Platforms: Git GitHub MLflow Librosa
Thynk Technology India
- Optimized Python-based ETL pipelines, improving overall data processing efficiency by 20%
- Performed in-depth Exploratory Data Analysis (EDA) to surface actionable business insights
- Built and tuned machine learning models, improving model precision by 12%
Python FastAPI Google Gemini API MySQL NLP Async Processing
- Engineered an AI assistant for scheduling, note generation, and task automation
- Built an NLP pipeline for intent recognition and entity extraction
- Designed multi-intent processing powered by the Google Gemini API
- Developed a high-performance FastAPI backend using async processing
- Achieved sub-2 second latency under concurrent workloads
- Implemented MySQL integration with retry and fallback mechanisms
React FastAPI Python MySQL XGBoost LSTM Gemini API Docker
- Developed a SaaS platform for financial analytics and portfolio tracking
- Built XGBoost and LSTM forecasting models for stock prediction and time-series forecasting
- Integrated a Gemini-powered chatbot for conversational financial insights
- Designed scalable, async REST APIs with sub-3 second response latency
- Implemented monitoring, caching, and a scalable deployment architecture
Python Pandas MySQL Power BI Streamlit
- Developed an end-to-end retail sales analytics and customer intelligence platform using the Brazilian Olist E-Commerce Dataset
- Processed and analyzed over 100,000+ e-commerce transactions across customers, orders, products, and payments
- Designed and optimized 25+ SQL business analytics queries for revenue, customer, and product performance analysis
- Implemented RFM Segmentation and Customer Lifetime Value (CLV) analysis to identify high-value and at-risk customers
- Developed interactive Streamlit and Power BI dashboards for real-time KPI monitoring, customer analytics, and business intelligence
- Automated data cleaning, feature engineering, and reporting workflows, reducing manual analysis effort by 40%
- Generated actionable insights to support customer retention, revenue optimization, and data-driven decision-making
Python XGBoost Scikit-learn FastAPI Streamlit SQLite MLflow
- Built a customer churn prediction platform achieving 87% ROC-AUC
- Developed a scalable preprocessing pipeline using ColumnTransformer, OneHotEncoder, and feature engineering
- Implemented MLflow experiment tracking for reproducible model development
- Designed a complete workflow: CSV → Database → API → Dashboard
- Implemented PSI-based data drift detection with an automated retraining workflow to sustain performance
Python Flask Librosa Pandas
- Developed a content-based music recommendation engine
- Extracted MFCC and Chroma audio features using Librosa
- Processed over 10,000+ audio tracks
- Improved recommendation accuracy by 28%
- Developed Flask REST APIs for real-time playlist generation, achieving response latency under 120 ms
- Optimized feature extraction for scalable batch processing
📅 2022 – 2026 | 🎯 GPA: 8.5 / 10.0
Relevant Coursework: Data Structures & Algorithms · Machine Learning · Database Management Systems · Operating Systems · Big Data
- 🏅 Ranked among the Top 10 Academic Performers across 1st–3rd year
- 🚀 Selected for Smart India Hackathon (SIH) 2024 & 2025
- 💡 Built multiple end-to-end Data Science and Machine Learning applications
- 📊 Demonstrated strong practical expertise in predictive analytics and AI systems

