Skip to content

Eliphaz21/Car-Price-Prediction_using_Regression-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Car Price Prediction website_using Regression Model

This project is a complete regression prediction application built with FastAPI (backend) and a simple HTML/CSS/JavaScript frontend.

The model predicts used car prices in USDT (numerically equivalent to USD here) and also returns the converted price in Ethiopian Birr (ETB). It exposes a /predict/regression API that the frontend uses.


1. Regression task and model choice

  • Task: Predict the market price of a used car.
  • Input features:
    • mileage – total mileage in kilometers.
    • age – age of the car in years.
    • engine_size – engine displacement in liters.
    • horsepower – engine power in HP.
    • doors – number of doors (2–5).
    • brand – car brand (Toyota, Hyundai, Suzuki, BMW, Mercedes, Volkswagen).
    • fuel_type – fuel type (petrol, diesel, hybrid, ev).
  • Model: RandomForestRegressor (scikit‑learn).
  • Why Random Forest:
    • Handles non‑linear relationships between features and price.
    • Works well with tabular data and one‑hot encoded categorical features (brand, fuel type).
    • Robust to outliers and feature scaling.
    • Natively exposes feature importances that the API/frontend can display.

Training uses a synthetic but realistic car dataset generated in train_model.py, including realistic brand and fuel‑type effects and a wide range of mileage/age/price. The same feature definitions are used in the API schema and in the frontend form, so everything is consistent end‑to‑end.


2. Project structure

  • requirements.txt – Python dependencies.
  • train_model.py – script to generate data, train the regression model, save the model and metadata.
  • models/
    • car_price_model.pkl – serialized trained model + feature and category info (generated by train_model.py).
    • metadata.json – model metadata and evaluation metrics (generated by train_model.py).
  • app/
    • __init__.py – package marker.
    • schemas.py – Pydantic request/response models + validation.
    • model.py – model loading and prediction wrapper (handles one‑hot encoding and log‑price inverse transform).
    • main.py – FastAPI application and endpoints.
  • templates/
    • index.html – frontend UI served by FastAPI.
  • static/
    • styles.css – modern, responsive styling.
    • app.js – frontend logic, validation, and API calls.

3. Setup instructions

3.1. Create and activate a virtual environment (recommended)

From the project root:

# Windows (PowerShell)
python -m venv .venv
.venv\Scripts\Activate.ps1

# or Windows (cmd)
python -m venv .venv
.venv\Scripts\activate.bat

3.2. Install dependencies

pip install -r requirements.txt

3.3. Train and export the regression model

python train_model.py

This will:

  • Generate synthetic car data (numeric + brand + fuel type).
  • Train a RandomForestRegressor model on log(price).
  • Evaluate the model (train/test split).
  • Save:
    • models/car_price_model.pkl
    • models/metadata.json
  • Print evaluation metrics in the console.

You must run this once before starting the API, otherwise the backend will not find the model file.

3.4. Run the FastAPI backend

uvicorn app.main:app --reload

The API will be available at:

  • API root / frontend: http://127.0.0.1:8000/
  • Health check: http://127.0.0.1:8000/health
  • Interactive docs (Swagger UI): http://127.0.0.1:8000/docs

4. FastAPI backend – endpoints and validation:

4.1. GET /health

  • Purpose: simple health check.
  • Response: JSON {"status": "ok"}.

4.2. GET /docs

  • Purpose: interactive OpenAPI documentation (Swagger UI).
  • Provided automatically by FastAPI.

4.3. GET /

  • Purpose: serve the main frontend page (templates/index.html).

4.4. POST /predict/regression

  • Purpose: run a regression prediction for car price.
  • Request body – JSON, validated by Pydantic (RegressionFeatures in app/schemas.py):
{
  "mileage": 60000,
  "age": 5,
  "engine_size": 2.0,
  "horsepower": 150,
  "doors": 4,
  "brand": "Toyota",
  "fuel_type": "petrol"
}

Validation rules (backed by Pydantic types and constraints):

  • mileage: float, 0 ≤ mileage ≤ 320000.
  • age: float, 0 ≤ age ≤ 30.
  • engine_size: float, 0.8 ≤ engine_size ≤ 6.0.
  • horsepower: float, 40 ≤ horsepower ≤ 600.
  • doors: integer, 2 ≤ doors ≤ 5.
  • brand: one of Toyota, Hyundai, Suzuki, BMW, Mercedes, Volkswagen.
  • fuel_type: one of petrol, diesel, hybrid, ev.

5. Frontend – user interface and validation:

The main frontend page is templates/index.html, served as GET / by FastAPI. It uses:

  • static/styles.css – clean, professional dark theme, responsive layout.
  • static/app.js – input validation and communication with the API.

5.1. Input form

The form contains inputs for all required features:

  • mileage<input type="number" min="0" max="320000">
  • age<input type="number" min="0" max="30">
  • engine_size<input type="number" min="0.8" max="6.0" step="0.1">
  • horsepower<input type="number" min="40" max="600">
  • doors<input type="number" min="2" max="5">
  • brand<select> with options Toyota, Hyundai, Suzuki, BMW, Mercedes, Volkswagen.
  • fuel_type<select> with options petrol, diesel, hybrid, ev.

Each field has:

  • A hint text with the valid range or explanation.
  • A per‑field error area below the input or select.

5.2. Frontend validation

In static/app.js:

  • Before sending the request, the code:
    • Checks that every numeric field is filled, numeric, and inside the allowed range.
    • Ensures that brand and fuel_type are selected.
    • Mirrors the backend’s validation ranges.
  • If a validation error occurs:
    • The input/select gets an invalid CSS class (red border + glow).
    • A human‑readable error message appears under that field.
    • A generic message appears under the form: “Please fix the highlighted fields.”

Only when the form is valid will it send the POST request to /predict/regression using fetch.

5.3. Displaying prediction and model explanation

On a successful response:

  • The predicted price is shown prominently in both USDT and ETB, side by side.
  • The conversion rate (1 USDT = 155.95 ETB) is displayed under the prices.
  • Feature importance is displayed as:
    • A sorted list of features.
    • Horizontal bars with lengths proportional to the importance value.
    • Percentage labels (e.g. 32.1%).
  • The raw model_metadata JSON (including RMSE, MAE, R², feature lists) is displayed in a preformatted block.

If the API returns an error (e.g. 422 or 500), the error text is shown below the form in a clear message box.


6. Model training and evaluation details

Training is implemented in train_model.py:

  1. Synthetic data generation (generate_synthetic_car_data):
    • Randomly samples realistic ranges for numeric features:
      • mileage, age, engine_size, horsepower, doors.
    • Samples categorical features:
      • brand from [Toyota, Hyundai, Suzuki, BMW, Mercedes, Volkswagen].
      • fuel_type from [petrol, diesel, hybrid, ev].
    • Combines them into a realistic price formula:
      • Brand factor (luxury brands cost more, budget brands cost less).
      • Fuel factor (EVs have lower effective price due to taxes/incentives).
      • Depreciation from higher mileage and age.
      • Positive effects from engine size and horsepower.
    • Adds Gaussian noise and clips prices to a realistic range (e.g. 2,000–120,000).
    • One‑hot encodes brand and fuel type and concatenates with numeric features.
  2. Train/test split:
    • 80% training, 20% testing (train_test_split with random_state=42).
  3. Target transform (log‑price):
    • Applies a natural log transform to the training targets:
      • y_train_log = np.log(y_train).
    • Trains the model on y_train_log instead of raw price to handle wide price ranges more smoothly.
  4. Model:
    • RandomForestRegressor with:
      • n_estimators=200
      • random_state=42
      • n_jobs=-1 (use all CPU cores).
  5. Evaluation metrics (on the test set):
    • Predicts log‑prices, then exponentiates back to original price scale.
    • Computes:
      • RMSE (Root Mean Squared Error): 3647.36
      • MAE (Mean Absolute Error): 2854.53
      • (coefficient of determination): 0.8942
    • These metrics mean:
      • On average, predictions are off by about $2.8k (MAE).
      • Typical error is around $3.6k (RMSE) on a 2k–120k range.
      • About 89% of the variance in prices is explained by the model (R²).

These values are also stored in models/metadata.json and returned by the API as part of model_metadata.


7. Example API requests

7.1. Example curl request

curl -X POST "http://127.0.0.1:8000/predict/regression" ^
  -H "Content-Type: application/json" ^
  -d "{ \"mileage\": 60000, \"age\": 5, \"engine_size\": 2.0, \"horsepower\": 150, \"doors\": 4, \"brand\": \"Toyota\", \"fuel_type\": \"petrol\" }"

7.2. Example using HTTPie

http POST http://127.0.0.1:8000/predict/regression \
  mileage:=60000 age:=5 engine_size:=2.0 horsepower:=150 doors:=4 \
  brand=Toyota fuel_type=petrol

7.3. Example JSON response (simplified)

{
  "predicted_price": 18523.42,
  "currency": "USDT",
  "etb_price": 2891629.8,
  "etb_currency": "ETB",
  "usdt_to_etb_rate": 155.95,
  "feature_importance": {
    "mileage": 0.32,
    "age": 0.2,
    "engine_size": 0.1,
    "horsepower": 0.15,
    "doors": 0.02,
    "brand_toyota": 0.08,
    "fuel_petrol": 0.03
  },
  "model_metadata": {
    "metrics": {
      "rmse": 3647.36,
      "mae": 2854.53,
      "r2": 0.8942
    }
  }
}

This how it look like:

Screenshot 2026-03-04 022238

8. How everything works together (step by step)

  1. You install dependencies with pip install -r requirements.txt.
  2. You train the model by running python train_model.py:
    • Synthetic car data (numeric + brand + fuel type) is generated.
    • The RandomForestRegressor learns how features map to log(price), then is saved with feature names and category lists.
    • Model + feature names + metrics are saved under models/.
  3. You start the FastAPI app with uvicorn app.main:app --reload.
  4. When the app starts, app/model.py:
    • Loads car_price_model.pkl and metadata.json.
    • Prepares a reusable RegressionModel instance in memory, including category lists and target transform info.
  5. A user opens the frontend at http://127.0.0.1:8000/:
    • Sees the form with all features (numeric + brand + fuel type).
    • Inputs values; frontend validates them.
  6. On submit, the frontend:
    • Sends a JSON POST request to /predict/regression with the form data.
  7. The backend:
    • Validates the request body using RegressionFeatures (Pydantic).
    • Rebuilds the numeric + one‑hot feature vector in the same order used during training.
    • Runs model.predict(...) and, if necessary, exponentiates from log(price) back to price.
    • Converts the USDT price to ETB and computes feature importances.
    • Returns a PredictionResponse containing the prediction in both currencies, feature importance, and metadata (including RMSE, MAE, R²).
  8. The frontend:
    • Displays the predicted price clearly in USDT and ETB.

    • Renders feature importance as bars.

    • Shows model metrics and details in the metadata panel.

      Auther:yeabsra andnet

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors