Four hands-on laboratory notebooks covering data wrangling, EDA, regression, and combined supervised/unsupervised learning.
This repository contains four practice sessions completed as part of the Machine Learning course at Astana IT University. Each notebook targets a distinct ML workflow — from initial data exploration to model training and cluster analysis.
- Data cleaning and imputation — handling missing values, type coercion, derived feature engineering on a population statistics dataset (urban/rural breakdowns).
- Exploratory data analysis (EDA) — descriptive statistics, distribution plots, correlation analysis with seaborn/matplotlib.
- Linear regression — predicting car prices from features (year, mileage, engine size, condition); train/test split, MSE and R² evaluation.
- Supervised classification — Logistic Regression and Decision Tree on the Mall Customers dataset (binary high-income label); confusion matrices, accuracy comparison.
- Unsupervised clustering — K-Means with the Elbow Method to determine optimal k; PCA-based cluster visualization.
- Feature scaling — StandardScaler before clustering; label encoding for categorical targets.
| File | Description |
|---|---|
task1.ipynb |
Data cleaning and descriptive statistics on a population dataset (RU/KZ columns) |
task2.ipynb |
EDA on Titanic-style passenger data — missing value handling, distribution and age visualizations |
task3.ipynb |
Linear regression on a cars dataset — feature distributions, scatter plots, MSE / R² evaluation |
task4.ipynb |
Supervised + unsupervised learning on Mall Customers — Logistic Regression, Decision Tree, K-Means clustering with Elbow Method and PCA visualization |
# 1. Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# 2. Install dependencies
pip install jupyter pandas numpy matplotlib seaborn scikit-learn
# 3. Launch JupyterLab
jupyter labAdil Ormanov — GitHub