Skip to content
View YashrajThube's full-sized avatar

Highlights

  • Pro

Block or report YashrajThube

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
YashrajThube/README.md

Hi 👋, I'm Yashraj Thube

Data Science | Data Analytics | Machine Learning | Deep Learning | Generative AI

Typing SVG

GitHubLinkedInEmail


👨‍💻 About Me

I'm a Data Science professional with hands-on, end-to-end experience across Data Analytics, Machine Learning, Deep Learning, and Generative AI — from cleaning raw data to deploying production-ready, AI-powered systems.

My core strength lies in turning messy, real-world data into clear insights and reliable predictive models, while engineering the scalable ETL pipelines and APIs needed to put those models into production.

  • 🔭 Currently building Machine Learning, Deep Learning & Generative AI applications
  • 📊 Skilled in Data Analytics, Exploratory Data Analysis (EDA), and Feature Engineering
  • 🌱 Deepening my expertise in Advanced NLP, LLMs, AI Agents, and MLOps
  • 🧠 Comfortable across the full ML lifecycle: data → features → model → deployment → monitoring
  • 🚀 Experienced in building scalable backend APIs and AI-powered microservices
  • 💡 Particularly interested in Financial Analytics, Healthcare AI, and Intelligent Automation
  • 👨‍💼 Open to Machine Learning Engineer, Data Scientist, Data Analyst, and AI Engineer roles

📫 Reach me at yashraj07thube.tech@gmail.com


🛠️ Tech Stack

Languages: Python SQL

Data Analytics & Visualization: Pandas NumPy Matplotlib Seaborn Exploratory Data Analysis (EDA)

Machine Learning: Scikit-learn XGBoost Feature Engineering Predictive Modeling Cross Validation TensorFlow

Deep Learning & Generative AI: Neural Networks LSTM NLP Generative AI Google Gemini API Explainable AI (SHAP)

Backend Development: FastAPI Flask React Streamlit Microservices Async Processing

Databases: MySQL SQLite

Data Engineering: ETL Pipelines Data Cleaning Data Preprocessing Data Transformation

Tools & Platforms: Git GitHub MLflow Librosa


💼 Work Experience

Associate Software Engineer Intern — Data Science

Thynk Technology India

  • Optimized Python-based ETL pipelines, improving overall data processing efficiency by 20%
  • Performed in-depth Exploratory Data Analysis (EDA) to surface actionable business insights
  • Built and tuned machine learning models, improving model precision by 12%

🚀 Featured Projects

🤖 Smart Assistant with NLP & Automation

Python FastAPI Google Gemini API MySQL NLP Async Processing

  • Engineered an AI assistant for scheduling, note generation, and task automation
  • Built an NLP pipeline for intent recognition and entity extraction
  • Designed multi-intent processing powered by the Google Gemini API
  • Developed a high-performance FastAPI backend using async processing
  • Achieved sub-2 second latency under concurrent workloads
  • Implemented MySQL integration with retry and fallback mechanisms

📈 Financial Analytics & Forecasting SaaS Platform

React FastAPI Python MySQL XGBoost LSTM Gemini API Docker

  • Developed a SaaS platform for financial analytics and portfolio tracking
  • Built XGBoost and LSTM forecasting models for stock prediction and time-series forecasting
  • Integrated a Gemini-powered chatbot for conversational financial insights
  • Designed scalable, async REST APIs with sub-3 second response latency
  • Implemented monitoring, caching, and a scalable deployment architecture

📊 Retail Sales Analytics & Customer Intelligence Platform

Python Pandas MySQL Power BI Streamlit

  • Developed an end-to-end retail sales analytics and customer intelligence platform using the Brazilian Olist E-Commerce Dataset
  • Processed and analyzed over 100,000+ e-commerce transactions across customers, orders, products, and payments
  • Designed and optimized 25+ SQL business analytics queries for revenue, customer, and product performance analysis
  • Implemented RFM Segmentation and Customer Lifetime Value (CLV) analysis to identify high-value and at-risk customers
  • Developed interactive Streamlit and Power BI dashboards for real-time KPI monitoring, customer analytics, and business intelligence
  • Automated data cleaning, feature engineering, and reporting workflows, reducing manual analysis effort by 40%
  • Generated actionable insights to support customer retention, revenue optimization, and data-driven decision-making

🎯 Customer Churn Prediction Platform

Python XGBoost Scikit-learn FastAPI Streamlit SQLite MLflow

  • Built a customer churn prediction platform achieving 87% ROC-AUC
  • Developed a scalable preprocessing pipeline using ColumnTransformer, OneHotEncoder, and feature engineering
  • Implemented MLflow experiment tracking for reproducible model development
  • Designed a complete workflow: CSV → Database → API → Dashboard
  • Implemented PSI-based data drift detection with an automated retraining workflow to sustain performance

🎵 Music Playlist Generator

Python Flask Librosa Pandas

  • Developed a content-based music recommendation engine
  • Extracted MFCC and Chroma audio features using Librosa
  • Processed over 10,000+ audio tracks
  • Improved recommendation accuracy by 28%
  • Developed Flask REST APIs for real-time playlist generation, achieving response latency under 120 ms
  • Optimized feature extraction for scalable batch processing

🎓 Education

Bachelor of Engineering (B.E.) in Computer Engineering

📅 2022 – 2026 | 🎯 GPA: 8.5 / 10.0

Relevant Coursework: Data Structures & Algorithms · Machine Learning · Database Management Systems · Operating Systems · Big Data


🏆 Achievements

  • 🏅 Ranked among the Top 10 Academic Performers across 1st–3rd year
  • 🚀 Selected for Smart India Hackathon (SIH) 2024 & 2025
  • 💡 Built multiple end-to-end Data Science and Machine Learning applications
  • 📊 Demonstrated strong practical expertise in predictive analytics and AI systems

📊 GitHub Stats

GitHub Stats Top Langs

🔥 GitHub Streak

GitHub Streak


🌐 Connect With Me

GitHubLinkedInEmail


⭐ Transforming Data into Intelligent Solutions with Data Science, Machine Learning & Generative AI ⭐

Popular repositories Loading

  1. Yashraj-AI-Assistant-OS Yashraj-AI-Assistant-OS Public

    Enterprise Generative AI scheduling assistant that transforms natural language into actionable workflows using Gemini AI, AI Agents, Google Calendar APIs, and intelligent planning systems.

    Python 1

  2. Amazon_Clone Amazon_Clone Public

    Amazon-inspired website page for practice HTML CSS

    HTML

  3. noise-project-and-pollution noise-project-and-pollution Public

    Jupyter Notebook

  4. Music-Playlist-Generation-System Music-Playlist-Generation-System Public template

    AI-Powered Music Playlist Generator using Flask, Librosa, Logistic Regression, and KNN.

    HTML

  5. Fastapi-System Fastapi-System Public

    A REST API built with FastAPI and MySQL to manage categories and products, supporting pagination and structured JSON responses.

    Jupyter Notebook

  6. customer-churn-prediction customer-churn-prediction Public

    Customer Churn Prediction System using Machine Learning, FastAPI, React.js and Analytics Dashboard.

    Python