A machine learning project that predicts the selling price of used cars based on features like present price, fuel type, transmission, mileage, and car age.
Built as part of my CodeAlpha Data Science Internship
| Detail | Description |
|---|---|
| Objective | Predict resale price of used cars |
| Dataset | 301 entries, 9 features |
| Models Used | Linear Regression, Random Forest Regressor |
| Best Model | Random Forest (highest R² score) |
| Tools | Python, Pandas, Scikit-learn, Matplotlib, Seaborn |
├── Car_Price_Prediction.ipynb # Main Jupyter Notebook
├── car data.csv # Dataset
├── requirements.txt # Python dependencies
└── README.md # Project documentation
| Feature | Description |
|---|---|
Car_Name |
Name of the car |
Year |
Year of purchase |
Selling_Price |
Price the owner wants to sell (Target) |
Present_Price |
Current ex-showroom price |
Driven_kms |
Kilometers driven |
Fuel_Type |
Petrol / Diesel / CNG |
Selling_type |
Dealer / Individual |
Transmission |
Manual / Automatic |
Owner |
Number of previous owners |
- Data Loading — Read CSV using Pandas
- Exploratory Data Analysis (EDA) — Visualize distributions, correlations, and relationships
- Feature Engineering — Created
Car_AgefromYear - Data Preprocessing — Label encoded categorical variables (
Fuel_Type,Selling_type,Transmission) - Model Training — Trained Linear Regression and Random Forest models
- Model Evaluation — Compared using MAE, RMSE, and R² Score
| Model | MAE | RMSE | R² Score |
|---|---|---|---|
| Linear Regression | ~1.3 | ~2.0 | ~0.84 |
| Random Forest | ~0.9 | ~1.3 | ~0.96 |
Random Forest significantly outperforms Linear Regression for this dataset.
- Present Price is the strongest predictor of selling price
- Car Age has a strong negative correlation with selling price
- Diesel cars tend to hold value better than Petrol cars
- Random Forest captures non-linear relationships much better than Linear Regression
-
Clone the repository
git clone https://github.com/kinzaemannn/CodeAlpha-Car-Price-Prediction.git cd CodeAlpha-Car-Price-Prediction -
Install dependencies
pip install -r requirements.txt
-
Open the notebook
jupyter notebook Car_Price_Prediction.ipynb
Or upload to Kaggle and run directly.
- Python 3.x
- Pandas — Data manipulation
- NumPy — Numerical operations
- Matplotlib & Seaborn — Data visualization
- Scikit-learn — Machine learning models and evaluation
This project is for educational purposes as part of the CodeAlpha Data Science Internship.
Kinza Eman
🔗 Kaggle Profile
🔗 GitHub