A full-stack web application that predicts the resale price of used cars in real-time.
Built with Flask, scikit-learn, and a dark-themed interactive dashboard.
- Overview
- Live Demo Preview
- Features
- Model Performance
- Tech Stack
- Project Structure
- Getting Started
- API Reference
- Dataset
- How It Works
AutoVal takes 10 real-world vehicle parameters and predicts the resale price using two trained ML models β a Random Forest Regressor (RΒ² = 0.938) and a Linear Regression baseline β both trained on 15,397 real CarDekho transactions.
The app features a live metrics dashboard that shows actual vs predicted prices, residual distributions, feature importance, and prediction error by vehicle age β all fetched dynamically from the Flask backend.
The app runs locally. Below are actual screenshots of the UI:
Real-time car price estimation with confidence interval and feature insights
Interactive ML performance dashboard with multiple evaluation metrics and charts
| Feature | Description |
|---|---|
| π― Real ML Prediction | Random Forest (RΒ²=0.938) + Linear Regression trained on 15,397 records |
| β‘ Instant Estimates | Sub-second predictions via Flask REST API |
| π Live Metrics Dashboard | 5 interactive Chart.js charts β all data fetched from backend |
| π Price Influencers | Shows which factors pushed the price up or down |
| π Confidence Score | Per-prediction confidence interval with animated bar |
| π Algorithm Toggle | Switch between RF and LR β charts update live |
| π± Responsive UI | Works on desktop and mobile |
| π Retrain Anytime | Run train.py with new data to update everything |
Both models trained on CarDekho dataset Β· 80/20 train-test split Β· random_state=42
| Metric | π² Random Forest | π Linear Regression |
|---|---|---|
| RΒ² Score | 0.938 | 0.689 |
| MAE (βΉ Lakhs) | 0.94 | 2.45 |
| RMSE (βΉ Lakhs) | 1.98 | 4.46 |
| MAPE | 13.4% | 40.1% |
max_power ββββββββββββββββββββββββββββββββββββ 68.1%
vehicle_age βββββββ 13.0%
mileage ββββ 8.1%
km_driven βββ 5.1%
engine ββ 3.5%
brand β 1.4%
seats β 0.5%
π‘ Key insight: Max power (bhp) is by far the strongest predictor of resale price β high-performance cars retain value significantly better than low-power equivalents regardless of age or mileage.
Backend
- Flask 3.0 β REST API server
- scikit-learn 1.4 β ML models
- pandas β data processing
- joblib β model serialization
Frontend
- Vanilla HTML / CSS / JS β zero frontend framework dependency
- Chart.js 4.4 β 5 interactive charts
- Google Fonts β Syne + DM Mono
- Fully dark-themed responsive UI
AutoVal/
β
βββ app.py # Flask server β API routes, loads trained models
βββ train.py # Training pipeline β run once to generate model files
βββ requirements.txt # Python dependencies
βββ cardekho_dataset.csv # Source dataset
β
βββ assets/
βββ form.png
βββ dashboard.png
βββ static/
β βββ style.css
βββ templates/
β βββ index.html # Frontend β form + metrics dashboard
β
βββ (generated after running train.py)
βββ rf_model.pkl # Trained Random Forest
βββ lr_model.pkl # Trained Linear Regression
βββ scaler.pkl # StandardScaler for LR
βββ features.pkl # Feature name list
βββ encoder_classes.json # Label encoder mappings
βββ metrics.json # Evaluation metrics + chart data
- Python 3.8+
- pip
git clone https://github.com/mayankagarwal-01/AutoVal.git
cd AutoValpython3 -m venv venv
# macOS / Linux
source venv/bin/activate
# Windows
venv\Scripts\activatepip install --no-user -r requirements.txtpython train.py --data cardekho_dataset.csvExpected training output
ββ AutoVal Training Pipeline ββββββββββββββββββββββββββββββββββ
[1/5] Loading & cleaning data...
Loaded 15,397 rows, 11 columns after cleaning.
[2/5] Encoding categoricals...
[3/5] Splitting train / test (80 / 20)...
Train: 12,317 Test: 3,080
[4/5] Training models...
β Random Forest Regressor...
β Linear Regression (with StandardScaler)...
[5/5] Saving artefacts...
ββ Results ββββββββββββββββββββββββββββββββββββββββββββββββββββ
Random Forest: RΒ²=0.938 MAE=βΉ0.94L RMSE=βΉ1.98L MAPE=13.4%
Linear Regression: RΒ²=0.689 MAE=βΉ2.45L RMSE=βΉ4.46L MAPE=40.1%
Top 5 feature importances (RF):
max_power 68.1%
vehicle_age 13.03%
mileage 8.08%
km_driven 5.13%
engine 3.48%
Saved: rf_model.pkl lr_model.pkl scaler.pkl
features.pkl encoder_classes.json metrics.json
ββ Done. Run `python app.py` to start the server. βββββββββββββ
python app.pyhttp://localhost:5000
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Serves the frontend |
POST |
/api/predict |
Returns predicted price, range, confidence, factors |
GET |
/api/metrics?algo=rf|lr |
RΒ², MAE, RMSE, MAPE |
GET |
/api/metrics/scatter?algo=rf|lr |
Actual vs predicted scatter data |
GET |
/api/metrics/residuals?algo=rf|lr |
Residual histogram data |
GET |
/api/metrics/importance |
Feature importance for both models |
GET |
/api/metrics/age-error |
MAE by vehicle age |
GET |
/api/encoders |
Valid values for all categorical fields |
GET |
/api/health |
Health check |
π POST /api/predict β Request & Response
Request body:
{
"brand": "Hyundai",
"vehicle_age": 6,
"km_driven": 45000,
"seller_type": "Individual",
"fuel_type": "Petrol",
"transmission_type": "Manual",
"mileage": 18.9,
"engine": 1197,
"max_power": 82.0,
"seats": 5,
"model": "rf"
}Response:
{
"price": 5.42,
"price_low": 4.99,
"price_high": 5.85,
"confidence": 88.3,
"model_name": "Random Forest Regressor",
"factors": [
{ "name": "Brand", "impact": 1, "label": "Hyundai" },
{ "name": "Vehicle Age", "impact": 0, "label": "6 yrs" },
{ "name": "Fuel Type", "impact": 0, "label": "Petrol" },
{ "name": "Transmission", "impact": 0, "label": "Manual" },
{ "name": "Max Power", "impact": 0, "label": "82.0 bhp"},
{ "name": "Kilometres Driven", "impact": 0, "label": "45k km" }
]
}CarDekho Used Car Dataset
- Source: Kaggle β nehalbirla/vehicle-dataset-from-cardekho
- Records: 15,411 β 15,397 after cleaning
- Features used: 10
- Target:
selling_price(converted to βΉ Lakhs)
| Feature | Type | Values / Range |
|---|---|---|
brand |
categorical | 32 brands (Maruti, Hyundai, BMW...) |
vehicle_age |
integer | 0 β 29 years |
km_driven |
integer | 100 β 500,000 km |
seller_type |
categorical | Individual / Dealer / Trustmark Dealer |
fuel_type |
categorical | Petrol / Diesel / CNG / LPG / Electric |
transmission_type |
categorical | Manual / Automatic |
mileage |
float | 4.0 β 33.5 kmpl |
engine |
integer | 793 β 6,592 cc |
max_power |
float | 38.4 β 626.0 bhp |
seats |
integer | 2 β 9 |
User fills form
β
βΌ
POST /api/predict
β
βΌ
app.py encodes categoricals
(brand β int, fuel β int, etc.)
β
βΌ
Builds pandas DataFrame
with correct feature names
β
βββββ model = "rf" βββββΊ rf_model.predict(X)
β β
βββββ model = "lr" βββββΊ scaler.transform(X)
β
lr_model.predict(X_scaled)
β
βΌ
Returns price, confidence range, influencing factors
β
βΌ
Frontend renders result panel
Metrics charts fetch from /api/metrics/* endpoints
Contributors [Mayank Agarwal] [Shreyansh Verma] [Suryansh Panda] [Janhavi Maheshwari] [Yashovardhan Singh] [Khushbu Raj]