Skip to content

mayankagarwal-01/AutoVal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš— AutoVal

Used Car Resale Price Estimator

Python Flask scikit-learn License


A full-stack web application that predicts the resale price of used cars in real-time.
Built with Flask, scikit-learn, and a dark-themed interactive dashboard.



πŸ“Œ Table of Contents


🧠 Overview

AutoVal takes 10 real-world vehicle parameters and predicts the resale price using two trained ML models β€” a Random Forest Regressor (RΒ² = 0.938) and a Linear Regression baseline β€” both trained on 15,397 real CarDekho transactions.

The app features a live metrics dashboard that shows actual vs predicted prices, residual distributions, feature importance, and prediction error by vehicle age β€” all fetched dynamically from the Flask backend.


πŸ–₯️ Live Demo Preview

The app runs locally. Below are actual screenshots of the UI:


πŸ”· Price Prediction Form


Real-time car price estimation with confidence interval and feature insights


πŸ“Š Metrics Dashboard


Interactive ML performance dashboard with multiple evaluation metrics and charts


✨ Features

Feature Description
🎯 Real ML Prediction Random Forest (R²=0.938) + Linear Regression trained on 15,397 records
⚑ Instant Estimates Sub-second predictions via Flask REST API
πŸ“Š Live Metrics Dashboard 5 interactive Chart.js charts β€” all data fetched from backend
πŸ” Price Influencers Shows which factors pushed the price up or down
πŸ“‰ Confidence Score Per-prediction confidence interval with animated bar
πŸ”€ Algorithm Toggle Switch between RF and LR β€” charts update live
πŸ“± Responsive UI Works on desktop and mobile
πŸ” Retrain Anytime Run train.py with new data to update everything

πŸ“ˆ Model Performance

Both models trained on CarDekho dataset Β· 80/20 train-test split Β· random_state=42

Metric 🌲 Random Forest πŸ“‰ Linear Regression
RΒ² Score 0.938 0.689
MAE (β‚Ή Lakhs) 0.94 2.45
RMSE (β‚Ή Lakhs) 1.98 4.46
MAPE 13.4% 40.1%

πŸ† Top Feature Importances (Random Forest)

max_power    β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  68.1%
vehicle_age  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                               13.0%
mileage      β–ˆβ–ˆβ–ˆβ–ˆ                                   8.1%
km_driven    β–ˆβ–ˆβ–ˆ                                    5.1%
engine       β–ˆβ–ˆ                                     3.5%
brand        β–ˆ                                      1.4%
seats        β–Œ                                      0.5%

πŸ’‘ Key insight: Max power (bhp) is by far the strongest predictor of resale price β€” high-performance cars retain value significantly better than low-power equivalents regardless of age or mileage.


πŸ› οΈ Tech Stack

Backend

Frontend

  • Vanilla HTML / CSS / JS β€” zero frontend framework dependency
  • Chart.js 4.4 β€” 5 interactive charts
  • Google Fonts β€” Syne + DM Mono
  • Fully dark-themed responsive UI

πŸ“ Project Structure

AutoVal/
β”‚
β”œβ”€β”€ app.py                    # Flask server β€” API routes, loads trained models
β”œβ”€β”€ train.py                  # Training pipeline β€” run once to generate model files
β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ cardekho_dataset.csv      # Source dataset
β”‚
β”œβ”€β”€ assets/
  β”œβ”€β”€ form.png
  └── dashboard.png
β”œβ”€β”€ static/
β”‚   └── style.css  
β”œβ”€β”€ templates/
β”‚   └── index.html            # Frontend β€” form + metrics dashboard
β”‚
└── (generated after running train.py)
    β”œβ”€β”€ rf_model.pkl           # Trained Random Forest
    β”œβ”€β”€ lr_model.pkl           # Trained Linear Regression
    β”œβ”€β”€ scaler.pkl             # StandardScaler for LR
    β”œβ”€β”€ features.pkl           # Feature name list
    β”œβ”€β”€ encoder_classes.json   # Label encoder mappings
    └── metrics.json           # Evaluation metrics + chart data

πŸš€ Getting Started

Prerequisites

  • Python 3.8+
  • pip

1. Clone the repository

git clone https://github.com/mayankagarwal-01/AutoVal.git
cd AutoVal

2. Create and activate virtual environment

python3 -m venv venv

# macOS / Linux
source venv/bin/activate

# Windows
venv\Scripts\activate

3. Install dependencies

pip install --no-user -r requirements.txt

4. Train the models

python train.py --data cardekho_dataset.csv
Expected training output
── AutoVal Training Pipeline ──────────────────────────────────
[1/5] Loading & cleaning data...
      Loaded 15,397 rows, 11 columns after cleaning.
[2/5] Encoding categoricals...
[3/5] Splitting train / test (80 / 20)...
      Train: 12,317   Test: 3,080
[4/5] Training models...
      β†’ Random Forest Regressor...
      β†’ Linear Regression (with StandardScaler)...
[5/5] Saving artefacts...

── Results ────────────────────────────────────────────────────
  Random Forest:      RΒ²=0.938  MAE=β‚Ή0.94L  RMSE=β‚Ή1.98L  MAPE=13.4%
  Linear Regression:  RΒ²=0.689  MAE=β‚Ή2.45L  RMSE=β‚Ή4.46L  MAPE=40.1%

  Top 5 feature importances (RF):
    max_power              68.1%
    vehicle_age            13.03%
    mileage                8.08%
    km_driven              5.13%
    engine                 3.48%

  Saved: rf_model.pkl  lr_model.pkl  scaler.pkl
         features.pkl  encoder_classes.json  metrics.json

── Done. Run `python app.py` to start the server. ─────────────

5. Start the server

python app.py

6. Open in browser

http://localhost:5000

πŸ”Œ API Reference

Method Endpoint Description
GET / Serves the frontend
POST /api/predict Returns predicted price, range, confidence, factors
GET /api/metrics?algo=rf|lr RΒ², MAE, RMSE, MAPE
GET /api/metrics/scatter?algo=rf|lr Actual vs predicted scatter data
GET /api/metrics/residuals?algo=rf|lr Residual histogram data
GET /api/metrics/importance Feature importance for both models
GET /api/metrics/age-error MAE by vehicle age
GET /api/encoders Valid values for all categorical fields
GET /api/health Health check
πŸ“‹ POST /api/predict β€” Request & Response

Request body:

{
  "brand": "Hyundai",
  "vehicle_age": 6,
  "km_driven": 45000,
  "seller_type": "Individual",
  "fuel_type": "Petrol",
  "transmission_type": "Manual",
  "mileage": 18.9,
  "engine": 1197,
  "max_power": 82.0,
  "seats": 5,
  "model": "rf"
}

Response:

{
  "price": 5.42,
  "price_low": 4.99,
  "price_high": 5.85,
  "confidence": 88.3,
  "model_name": "Random Forest Regressor",
  "factors": [
    { "name": "Brand",             "impact": 1,  "label": "Hyundai" },
    { "name": "Vehicle Age",       "impact": 0,  "label": "6 yrs"   },
    { "name": "Fuel Type",         "impact": 0,  "label": "Petrol"  },
    { "name": "Transmission",      "impact": 0,  "label": "Manual"  },
    { "name": "Max Power",         "impact": 0,  "label": "82.0 bhp"},
    { "name": "Kilometres Driven", "impact": 0,  "label": "45k km"  }
  ]
}

πŸ“¦ Dataset

CarDekho Used Car Dataset

Feature Type Values / Range
brand categorical 32 brands (Maruti, Hyundai, BMW...)
vehicle_age integer 0 – 29 years
km_driven integer 100 – 500,000 km
seller_type categorical Individual / Dealer / Trustmark Dealer
fuel_type categorical Petrol / Diesel / CNG / LPG / Electric
transmission_type categorical Manual / Automatic
mileage float 4.0 – 33.5 kmpl
engine integer 793 – 6,592 cc
max_power float 38.4 – 626.0 bhp
seats integer 2 – 9

βš™οΈ How It Works

User fills form
      β”‚
      β–Ό
POST /api/predict
      β”‚
      β–Ό
app.py encodes categoricals
(brand β†’ int, fuel β†’ int, etc.)
      β”‚
      β–Ό
Builds pandas DataFrame
with correct feature names
      β”‚
      β”œβ”€β”€β”€β”€ model = "rf" ────► rf_model.predict(X)
      β”‚                              β”‚
      └──── model = "lr" ────►  scaler.transform(X)
                                     β”‚
                                lr_model.predict(X_scaled)
      β”‚
      β–Ό
Returns price, confidence range, influencing factors
      β”‚
      β–Ό
Frontend renders result panel
Metrics charts fetch from /api/metrics/* endpoints

Contributors [Mayank Agarwal] [Shreyansh Verma] [Suryansh Panda] [Janhavi Maheshwari] [Yashovardhan Singh] [Khushbu Raj]

About

AI-powered vehicle valuation and analytics platform designed to estimate car prices using data-driven insights and automation. AutoVal combines intelligent prediction models, modern UI/UX, and scalable backend integration to deliver accurate valuations, streamlined workflows, and efficient automotive data analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors