Skip to content
View Vemula-Vinayak's full-sized avatar

Block or report Vemula-Vinayak

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Vemula-Vinayak/README.md

Hi, I'm Vinayak Vemula 👋

MS Data Science @ Montclair State University '26 | AI/ML · NLP · Python · SQL · Power BI

Building AI-powered pipelines, machine learning models, and analytics dashboards that turn complex data into real decisions.

🧠 About Me 🎓 MS Data Science, Montclair State University (GPA 3.83/4.0, May 2026) 🤖 Passionate about AI/ML, NLP, and applied data science 💼 Former Data Scientist Intern @ Main Flow Services & Technologies 📍 New Jersey — open to NYC metro & remote roles 🛂 Available on OPT from June 2026 📫 Reach me: vinayakvemula09@gmail.com 🔗 LinkedIn

🛠 Tech Stack Languages & ML Python SQL PyTorch Scikit-Learn XGBoost

AI & NLP HuggingFace LangChain OpenAI

Data & BI Pandas NumPy Power BI Tableau Plotly GeoPandas

Tools Flask Git MySQL Jupyter

🚀 Featured Projects

🌱 Recycling Awareness Data Dashboard — Master's Capstone Python · Flask · Pandas · SciPy · Chart.js · EPA Data Built a full-stack analytics dashboard analyzing U.S. recycling rates (1960–2022) using real EPA government data.

  • Engineered complete ETL pipeline from raw Excel → structured CSVs → 9 Flask REST API endpoints
  • Applied linear regression & Pearson correlation (Paper R²=0.97, Metal R²=0.91)
  • Key insight: deposit-law states recycle 2.4× more glass than non-deposit states
  • Interactive dashboard with choropleth maps, KPI cards, and trend visualizations
  • Live: recycling-analytics-api.onrender.com

🧪 Environmental Data QA/QC and Regulatory Compliance Pipeline Python · Pandas · Flask · GeoPandas · Folium End-to-end data validation pipeline replicating environmental consulting workflows for lab EDD review and regulatory compliance.

  • Built automated QA/QC engine running 5 validation checks on 2,400 EQuIS-style lab records: field completeness, duplicate detection, impossible-value flagging, and holding time compliance by analyte category
  • Compared results against a 12-analyte EPA Maximum Contaminant Level lookup table to flag regulatory exceedances by site
  • Built an interactive GIS map of 38 monitoring locations (no ArcGIS license required) and a Power BI-style HTML dashboard
  • Exposed validation results via a Flask REST API including a live CSV-upload validation endpoint
  • Live: env-qaqc-api.onrender.com

🔐 Credit Card Fraud Detection Pipeline Python · Scikit-Learn · SMOTE · PCA · XGBoost · Matplotlib End-to-end ML pipeline on 284,000+ financial transactions for anomaly and fraud detection.

  • Handled severe class imbalance with SMOTE oversampling
  • Applied PCA for dimensionality reduction + StandardScaler normalization
  • Benchmarked Logistic Regression, Random Forest, and XGBoost optimizing for fraud recall

🚗 U.S. Fatal Accidents Analytics Dashboard Python · Pandas · SQL · Plotly · Seaborn · Mapbox Analyzed 39,000+ FARS crash records to identify temporal, geographic, and environmental risk patterns.

  • Optimized SQL queries reduced report generation time by 40%
  • Interactive Plotly dashboard with bubble maps, choropleth maps, and drill-down filters
  • Key finding: evening hours (5–8 PM) + adverse lighting = top contributing factors

🧬 Graph Clustering with Graph Neural Networks (GNNs) PyTorch · GCN · Cora Dataset · NMI · Modularity Metrics Unsupervised graph clustering pipeline using Graph Convolutional Networks for community detection.

  • Preprocessed Cora citation dataset to generate graph embeddings
  • Evaluated with Normalized Mutual Information (NMI) and modularity metrics
  • Applied deep learning for representation learning on graph-structured data

📊 Classification Model Benchmarking Study Python · Scikit-Learn · XGBoost · AdaBoost · SVM · NumPy Comprehensive benchmarking of 8 classification algorithms on structured datasets.

  • Algorithms: Decision Tree, Naive Bayes (Gaussian & Multinomial), SVM (Linear & RBF), k-NN, Random Forest, AdaBoost, XGBoost
  • XGBoost: 97.8% accuracy · SVM (RBF): 97.6% accuracy
  • Full performance report with trade-off analysis across accuracy, interpretability, and compute cost

🎵 Spotify Song Popularity Prediction Python · Scikit-Learn · Random Forest · K-Means · Linear Regression Multi-algorithm ML on Spotify audio features to predict song popularity.

  • Compared classification, regression, and clustering approaches with cross-validation
  • Feature importance: energy, danceability, and loudness are top predictors

🎓 Certifications

Certificate Issuer
Data Analytics Accenture (Forage)
Databases and SQL for Data Science IBM — Coursera
Supervised Machine Learning DeepLearning.AI & Stanford — Coursera
Python for Everybody Google — Coursera

📫 Let's Connect LinkedIn · Email

Open to AI/ML Analyst, Junior Data Scientist, NLP Analyst, and Environmental Data Analyst roles. Available on OPT June 2026.

Pinned Loading

  1. environmental-qaqc-pipeline environmental-qaqc-pipeline Public

    End-to-end environmental data QA/QC and regulatory compliance pipeline — validates lab EDD data against EPA holding-time and MCL standards, with GIS mapping, Excel/dashboard reporting, and a deploy…

    Python

  2. Recycling-dashboard Recycling-dashboard Public

    Full-stack recycling data analytics dashboard — Flask + Chart.js, EPA data 1960–2022

    Jupyter Notebook

  3. AB-Testing-Simulation-Ecommerce AB-Testing-Simulation-Ecommerce Public

    End-to-end A/B test simulation with sample size calculation, z-test, chi-square, and $2.7M revenue impact analysis — Python, SciPy

    Jupyter Notebook

  4. Netflix-Content-Strategy-Analysis Netflix-Content-Strategy-Analysis Public

    Data-driven content strategy analysis on 8,807 Netflix titles — Python, Plotly, Pandas | Business recommendations for Netflix subscriber retention

    Jupyter Notebook

  5. US-Fatal-Accidents-Analysis US-Fatal-Accidents-Analysis Public

    Exploratory data analysis of 2022 U.S. fatal traffic accidents using interactive maps and statistical visualization.

    Jupyter Notebook