This project is a complete regression prediction application built with FastAPI (backend) and a simple HTML/CSS/JavaScript frontend.
The model predicts used car prices in USDT (numerically equivalent to USD here) and also returns the converted price in Ethiopian Birr (ETB). It exposes a /predict/regression API that the frontend uses.
- Task: Predict the market price of a used car.
- Input features:
mileage– total mileage in kilometers.age– age of the car in years.engine_size– engine displacement in liters.horsepower– engine power in HP.doors– number of doors (2–5).brand– car brand (Toyota,Hyundai,Suzuki,BMW,Mercedes,Volkswagen).fuel_type– fuel type (petrol,diesel,hybrid,ev).
- Model:
RandomForestRegressor(scikit‑learn). - Why Random Forest:
- Handles non‑linear relationships between features and price.
- Works well with tabular data and one‑hot encoded categorical features (brand, fuel type).
- Robust to outliers and feature scaling.
- Natively exposes feature importances that the API/frontend can display.
Training uses a synthetic but realistic car dataset generated in train_model.py, including realistic brand and fuel‑type effects and a wide range of mileage/age/price. The same feature definitions are used in the API schema and in the frontend form, so everything is consistent end‑to‑end.
requirements.txt– Python dependencies.train_model.py– script to generate data, train the regression model, save the model and metadata.models/car_price_model.pkl– serialized trained model + feature and category info (generated bytrain_model.py).metadata.json– model metadata and evaluation metrics (generated bytrain_model.py).
app/__init__.py– package marker.schemas.py– Pydantic request/response models + validation.model.py– model loading and prediction wrapper (handles one‑hot encoding and log‑price inverse transform).main.py– FastAPI application and endpoints.
templates/index.html– frontend UI served by FastAPI.
static/styles.css– modern, responsive styling.app.js– frontend logic, validation, and API calls.
From the project root:
# Windows (PowerShell)
python -m venv .venv
.venv\Scripts\Activate.ps1
# or Windows (cmd)
python -m venv .venv
.venv\Scripts\activate.batpip install -r requirements.txtpython train_model.pyThis will:
- Generate synthetic car data (numeric + brand + fuel type).
- Train a
RandomForestRegressormodel on log(price). - Evaluate the model (train/test split).
- Save:
models/car_price_model.pklmodels/metadata.json
- Print evaluation metrics in the console.
You must run this once before starting the API, otherwise the backend will not find the model file.
uvicorn app.main:app --reloadThe API will be available at:
- API root / frontend:
http://127.0.0.1:8000/ - Health check:
http://127.0.0.1:8000/health - Interactive docs (Swagger UI):
http://127.0.0.1:8000/docs
- Purpose: simple health check.
- Response: JSON
{"status": "ok"}.
- Purpose: interactive OpenAPI documentation (Swagger UI).
- Provided automatically by FastAPI.
- Purpose: serve the main frontend page (
templates/index.html).
- Purpose: run a regression prediction for car price.
- Request body – JSON, validated by Pydantic (
RegressionFeaturesinapp/schemas.py):
{
"mileage": 60000,
"age": 5,
"engine_size": 2.0,
"horsepower": 150,
"doors": 4,
"brand": "Toyota",
"fuel_type": "petrol"
}Validation rules (backed by Pydantic types and constraints):
mileage: float, 0 ≤ mileage ≤ 320000.age: float, 0 ≤ age ≤ 30.engine_size: float, 0.8 ≤ engine_size ≤ 6.0.horsepower: float, 40 ≤ horsepower ≤ 600.doors: integer, 2 ≤ doors ≤ 5.brand: one ofToyota,Hyundai,Suzuki,BMW,Mercedes,Volkswagen.fuel_type: one ofpetrol,diesel,hybrid,ev.
The main frontend page is templates/index.html, served as GET / by FastAPI. It uses:
static/styles.css– clean, professional dark theme, responsive layout.static/app.js– input validation and communication with the API.
The form contains inputs for all required features:
mileage–<input type="number" min="0" max="320000">age–<input type="number" min="0" max="30">engine_size–<input type="number" min="0.8" max="6.0" step="0.1">horsepower–<input type="number" min="40" max="600">doors–<input type="number" min="2" max="5">brand–<select>with optionsToyota,Hyundai,Suzuki,BMW,Mercedes,Volkswagen.fuel_type–<select>with optionspetrol,diesel,hybrid,ev.
Each field has:
- A hint text with the valid range or explanation.
- A per‑field error area below the input or select.
In static/app.js:
- Before sending the request, the code:
- Checks that every numeric field is filled, numeric, and inside the allowed range.
- Ensures that
brandandfuel_typeare selected. - Mirrors the backend’s validation ranges.
- If a validation error occurs:
- The input/select gets an
invalidCSS class (red border + glow). - A human‑readable error message appears under that field.
- A generic message appears under the form: “Please fix the highlighted fields.”
- The input/select gets an
Only when the form is valid will it send the POST request to /predict/regression using fetch.
On a successful response:
- The predicted price is shown prominently in both USDT and ETB, side by side.
- The conversion rate (
1 USDT = 155.95 ETB) is displayed under the prices. - Feature importance is displayed as:
- A sorted list of features.
- Horizontal bars with lengths proportional to the importance value.
- Percentage labels (e.g.
32.1%).
- The raw
model_metadataJSON (including RMSE, MAE, R², feature lists) is displayed in a preformatted block.
If the API returns an error (e.g. 422 or 500), the error text is shown below the form in a clear message box.
Training is implemented in train_model.py:
- Synthetic data generation (
generate_synthetic_car_data):- Randomly samples realistic ranges for numeric features:
mileage,age,engine_size,horsepower,doors.
- Samples categorical features:
brandfrom[Toyota, Hyundai, Suzuki, BMW, Mercedes, Volkswagen].fuel_typefrom[petrol, diesel, hybrid, ev].
- Combines them into a realistic price formula:
- Brand factor (luxury brands cost more, budget brands cost less).
- Fuel factor (EVs have lower effective price due to taxes/incentives).
- Depreciation from higher mileage and age.
- Positive effects from engine size and horsepower.
- Adds Gaussian noise and clips prices to a realistic range (e.g. 2,000–120,000).
- One‑hot encodes brand and fuel type and concatenates with numeric features.
- Randomly samples realistic ranges for numeric features:
- Train/test split:
- 80% training, 20% testing (
train_test_splitwithrandom_state=42).
- 80% training, 20% testing (
- Target transform (log‑price):
- Applies a natural log transform to the training targets:
y_train_log = np.log(y_train).
- Trains the model on
y_train_loginstead of raw price to handle wide price ranges more smoothly.
- Applies a natural log transform to the training targets:
- Model:
RandomForestRegressorwith:n_estimators=200random_state=42n_jobs=-1(use all CPU cores).
- Evaluation metrics (on the test set):
- Predicts log‑prices, then exponentiates back to original price scale.
- Computes:
- RMSE (Root Mean Squared Error):
3647.36 - MAE (Mean Absolute Error):
2854.53 - R² (coefficient of determination):
0.8942
- RMSE (Root Mean Squared Error):
- These metrics mean:
- On average, predictions are off by about $2.8k (MAE).
- Typical error is around $3.6k (RMSE) on a 2k–120k range.
- About 89% of the variance in prices is explained by the model (R²).
These values are also stored in models/metadata.json and returned by the API as part of model_metadata.
curl -X POST "http://127.0.0.1:8000/predict/regression" ^
-H "Content-Type: application/json" ^
-d "{ \"mileage\": 60000, \"age\": 5, \"engine_size\": 2.0, \"horsepower\": 150, \"doors\": 4, \"brand\": \"Toyota\", \"fuel_type\": \"petrol\" }"http POST http://127.0.0.1:8000/predict/regression \
mileage:=60000 age:=5 engine_size:=2.0 horsepower:=150 doors:=4 \
brand=Toyota fuel_type=petrol{
"predicted_price": 18523.42,
"currency": "USDT",
"etb_price": 2891629.8,
"etb_currency": "ETB",
"usdt_to_etb_rate": 155.95,
"feature_importance": {
"mileage": 0.32,
"age": 0.2,
"engine_size": 0.1,
"horsepower": 0.15,
"doors": 0.02,
"brand_toyota": 0.08,
"fuel_petrol": 0.03
},
"model_metadata": {
"metrics": {
"rmse": 3647.36,
"mae": 2854.53,
"r2": 0.8942
}
}
}
- You install dependencies with
pip install -r requirements.txt. - You train the model by running
python train_model.py:- Synthetic car data (numeric + brand + fuel type) is generated.
- The
RandomForestRegressorlearns how features map to log(price), then is saved with feature names and category lists. - Model + feature names + metrics are saved under
models/.
- You start the FastAPI app with
uvicorn app.main:app --reload. - When the app starts,
app/model.py:- Loads
car_price_model.pklandmetadata.json. - Prepares a reusable
RegressionModelinstance in memory, including category lists and target transform info.
- Loads
- A user opens the frontend at
http://127.0.0.1:8000/:- Sees the form with all features (numeric + brand + fuel type).
- Inputs values; frontend validates them.
- On submit, the frontend:
- Sends a JSON POST request to
/predict/regressionwith the form data.
- Sends a JSON POST request to
- The backend:
- Validates the request body using
RegressionFeatures(Pydantic). - Rebuilds the numeric + one‑hot feature vector in the same order used during training.
- Runs
model.predict(...)and, if necessary, exponentiates from log(price) back to price. - Converts the USDT price to ETB and computes feature importances.
- Returns a
PredictionResponsecontaining the prediction in both currencies, feature importance, and metadata (including RMSE, MAE, R²).
- Validates the request body using
- The frontend:
-
Displays the predicted price clearly in USDT and ETB.
-
Renders feature importance as bars.
-
Shows model metrics and details in the metadata panel.
Auther:yeabsra andnet
-