Vinayak Vemula Vemula-Vinayak

Hi, I'm Vinayak Vemula 👋

MS Data Science @ Montclair State University '26 | AI/ML · NLP · Python · SQL · Power BI

Building AI-powered pipelines, machine learning models, and analytics dashboards that turn complex data into real decisions.

🧠 About Me 🎓 MS Data Science, Montclair State University (GPA 3.83/4.0, May 2026) 🤖 Passionate about AI/ML, NLP, and applied data science 💼 Former Data Scientist Intern @ Main Flow Services & Technologies 📍 New Jersey — open to NYC metro & remote roles 🛂 Available on OPT from June 2026 📫 Reach me: vinayakvemula09@gmail.com 🔗 LinkedIn

🛠 Tech Stack Languages & ML Python SQL PyTorch Scikit-Learn XGBoost

AI & NLP HuggingFace LangChain OpenAI

Data & BI Pandas NumPy Power BI Tableau Plotly GeoPandas

Tools Flask Git MySQL Jupyter

🚀 Featured Projects

🌱 Recycling Awareness Data Dashboard — Master's Capstone Python · Flask · Pandas · SciPy · Chart.js · EPA Data Built a full-stack analytics dashboard analyzing U.S. recycling rates (1960–2022) using real EPA government data.

Engineered complete ETL pipeline from raw Excel → structured CSVs → 9 Flask REST API endpoints
Applied linear regression & Pearson correlation (Paper R²=0.97, Metal R²=0.91)
Key insight: deposit-law states recycle 2.4× more glass than non-deposit states
Interactive dashboard with choropleth maps, KPI cards, and trend visualizations
Live: recycling-analytics-api.onrender.com

🧪 Environmental Data QA/QC and Regulatory Compliance Pipeline Python · Pandas · Flask · GeoPandas · Folium End-to-end data validation pipeline replicating environmental consulting workflows for lab EDD review and regulatory compliance.

Built automated QA/QC engine running 5 validation checks on 2,400 EQuIS-style lab records: field completeness, duplicate detection, impossible-value flagging, and holding time compliance by analyte category
Compared results against a 12-analyte EPA Maximum Contaminant Level lookup table to flag regulatory exceedances by site
Built an interactive GIS map of 38 monitoring locations (no ArcGIS license required) and a Power BI-style HTML dashboard
Exposed validation results via a Flask REST API including a live CSV-upload validation endpoint
Live: env-qaqc-api.onrender.com

🔐 Credit Card Fraud Detection Pipeline Python · Scikit-Learn · SMOTE · PCA · XGBoost · Matplotlib End-to-end ML pipeline on 284,000+ financial transactions for anomaly and fraud detection.

Handled severe class imbalance with SMOTE oversampling
Applied PCA for dimensionality reduction + StandardScaler normalization
Benchmarked Logistic Regression, Random Forest, and XGBoost optimizing for fraud recall

🚗 U.S. Fatal Accidents Analytics Dashboard Python · Pandas · SQL · Plotly · Seaborn · Mapbox Analyzed 39,000+ FARS crash records to identify temporal, geographic, and environmental risk patterns.

Optimized SQL queries reduced report generation time by 40%
Interactive Plotly dashboard with bubble maps, choropleth maps, and drill-down filters
Key finding: evening hours (5–8 PM) + adverse lighting = top contributing factors

🧬 Graph Clustering with Graph Neural Networks (GNNs) PyTorch · GCN · Cora Dataset · NMI · Modularity Metrics Unsupervised graph clustering pipeline using Graph Convolutional Networks for community detection.

Preprocessed Cora citation dataset to generate graph embeddings
Evaluated with Normalized Mutual Information (NMI) and modularity metrics
Applied deep learning for representation learning on graph-structured data

📊 Classification Model Benchmarking Study Python · Scikit-Learn · XGBoost · AdaBoost · SVM · NumPy Comprehensive benchmarking of 8 classification algorithms on structured datasets.

Algorithms: Decision Tree, Naive Bayes (Gaussian & Multinomial), SVM (Linear & RBF), k-NN, Random Forest, AdaBoost, XGBoost
XGBoost: 97.8% accuracy · SVM (RBF): 97.6% accuracy
Full performance report with trade-off analysis across accuracy, interpretability, and compute cost

🎵 Spotify Song Popularity Prediction Python · Scikit-Learn · Random Forest · K-Means · Linear Regression Multi-algorithm ML on Spotify audio features to predict song popularity.

Compared classification, regression, and clustering approaches with cross-validation
Feature importance: energy, danceability, and loudness are top predictors

🎓 Certifications

Certificate	Issuer
Data Analytics	Accenture (Forage)
Databases and SQL for Data Science	IBM — Coursera
Supervised Machine Learning	DeepLearning.AI & Stanford — Coursera
Python for Everybody	Google — Coursera

📫 Let's Connect LinkedIn · Email

Open to AI/ML Analyst, Junior Data Scientist, NLP Analyst, and Environmental Data Analyst roles. Available on OPT June 2026.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vinayak Vemula Vemula-Vinayak

Block or report Vemula-Vinayak

Pinned Loading

Uh oh!