Welcome to my Elevvo Internship Program repository!
This repository documents my journey, tasks, and completed projects during the Elevvo internship focused on Machine Learning and AI-driven problem-solving.
Each notebook demonstrates a different real-world use case โ from predictive modeling and clustering to computer vision and recommendation systems.
The Elevvo Internship Program allowed me to apply data science and ML concepts on various datasets.
It strengthened my understanding of the end-to-end ML workflow โ data preprocessing, model training, hyperparameter tuning, and evaluation.
Participants were expected to complete:
- โ 4+ tasks for a 1-month internship
This repository contains all completed core tasks.
- Apply supervised and unsupervised learning techniques on diverse datasets.
- Explore data preprocessing, feature engineering, and model evaluation.
- Build real-world machine learning projects using Python and Scikit-learn.
- Understand and compare performance of various models and metrics.
- Python
- NumPy, Pandas
- Matplotlib, Seaborn
- Scikit-learn
- TensorFlow / Keras
- OpenCV
- XGBoost / LightGBM
- Google Colab
Goal: Predict studentsโ exam scores based on study hours and related academic factors.
Dataset: Student Performance Dataset โ Kaggle
- Loaded and cleaned student performance data
- Performed exploratory data analysis (EDA) using Matplotlib and Seaborn
- Trained a Linear Regression model to predict final exam scores
- Evaluated with metrics like Rยฒ, MAE, and RMSE
- Study hours and participation levels had a strong positive correlation with scores
- The Linear Regression model achieved an Rยฒ score above 0.85, showing strong predictive power
- Bonus experiments with Polynomial Regression improved results slightly
Regression | EDA | Feature Engineering | Model Evaluation
Goal: Group customers into clusters based on annual income and spending behavior.
Dataset: Mall Customer Dataset โ Kaggle
- Scaled features using StandardScaler
- Applied K-Means Clustering and determined optimal cluster number using the Elbow Method
- Visualized customer groups in 2D space (Income vs. Spending Score)
- Identified 5 distinct clusters representing different spending behaviors (e.g., high-income low-spending vs. low-income high-spending)
- Helped visualize how customers differ across spending habits โ useful for marketing strategies
- Bonus: Tested DBSCAN clustering for better separation
Unsupervised Learning | K-Means | DBSCAN | Data Visualization
Goal: Predict whether a bank loan application will be approved based on applicant information.
Dataset: Loan Approval Prediction Dataset โ Kaggle
- Handled missing values and categorical features using Label Encoding and One-Hot Encoding
- Split the dataset into training/testing subsets
- Trained and compared Logistic Regression, Decision Tree, and Random Forest classifiers
- Random Forest achieved the highest accuracy (~95%), outperforming Logistic Regression
- Gender, ApplicantIncome, and Credit_History were key factors influencing predictions
- Used SMOTE to handle class imbalance and improve recall
Binary Classification | Data Encoding | Imbalanced Learning | Evaluation Metrics
Goal: Build a movie recommender using collaborative filtering techniques.
Dataset: MovieLens 100K Dataset โ Kaggle
- Created a user-item rating matrix
- Computed similarity scores using cosine similarity between users
- Recommended top-rated unseen movies based on similar usersโ preferences
- Successfully recommended personalized movie lists using user-based collaborative filtering
- Experimented with item-based filtering and SVD matrix factorization for improved performance
- Evaluated recommendations using Precision@K
Recommendation System | Collaborative Filtering | Cosine Similarity | Matrix Factorization
Goal: Classify German traffic signs using Convolutional Neural Networks (CNN).
Dataset: GTSRB โ German Traffic Sign Recognition Benchmark
- Preprocessed images (resizing, normalization)
- Built a custom CNN using Keras
- Trained the model on 40+ sign categories
- Evaluated performance using accuracy and confusion matrix
- The CNN achieved an accuracy of 98% on the test set
- Using data augmentation improved generalization
Deep Learning | CNN | Image Preprocessing | Transfer Learning
| Category | Metrics |
|---|---|
| Regression | Rยฒ, MAE, RMSE |
| Classification | Accuracy, Precision, Recall, F1-score |
| Clustering | Silhouette Score, Inertia |
| Recommendation | Precision@K |
| Deep Learning | Accuracy, Loss Curve, Confusion Matrix |
- Clone the repository:
git clone https://github.com/Bekamgenene/Elevvo-Internship-Program.git cd Elevvo-Internship-Program