A modular machine learning package for predicting used car prices using XGBoost and Neural Network models.
Supports both training (with GPU) and deployment (CPU, with Streamlit UI and batch prediction).
Used_Car_Price/
│
├── src/
│ ├── preprocessing.py # Data cleaning and preprocessing
│ ├── feature_engineering.py # Feature engineering for both models
│ ├── xgboost_model.py # XGBoost model class
│ ├── neural_network.py # Neural Network model class
│
├── main.py # Training pipeline (run on training server)
├── streamlit_app.py # Streamlit UI for deployment (run on deploy server)
├── generate_fake_cars.py # Script to generate fake test data
├── requirements.txt # Python dependencies
├── .gitignore # Files and folders to ignore in git
├── readme.md # This file
├── models/ # Saved models (created after training)
└── test data/ # Test data CSVs for deployment and validation
- Purpose: Train models, evaluate, and save them for deployment.
- Key Script:
main.py - Outputs:
models/xgboost_model.joblibmodels/neural_network_model.h5
- Purpose: Load trained models, predict prices for new data, provide UI and batch prediction.
- Key Script:
streamlit_app.py - Test Data: Test data (CSV) is mandatory and should be placed in the
test data/folder for batch prediction and validation. - Test Data Generation: Use
generate_fake_cars.pyif you need synthetic test data.
-
Clone the repository:
git clone <your_repo_url> cd Used_Car_Price
-
Create and activate a virtual environment (recommended):
python3 -m venv price_analyzer source price_analyzer/bin/activate -
Install dependencies:
pip install -r requirements.txt
-
Prepare your training data:
Place your raw CSV (e.g.,vehicles.csv) in thedata/directory. -
Run the training pipeline:
python main.py
- This will preprocess data, engineer features, train both models, evaluate, and save them to
models/.
- This will preprocess data, engineer features, train both models, evaluate, and save them to
-
Transfer the
models/directory to your deploy server.
-
Ensure the following are present on your deploy server:
models/directory with trained modelssrc/directory with all modulesstreamlit_app.pyrequirements.txt- Test data CSVs in the
test data/folder (mandatory for batch prediction)
-
Install dependencies:
pip install -r requirements.txt
-
Run the Streamlit app:
streamlit run streamlit_app.py
- Access the UI in your browser.
- Enter car details for single prediction or upload a CSV from the
test data/folder for batch prediction (test data required).
To create synthetic data for testing or demo:
python generate_fake_cars.py- This will generate:
fake_cars_for_prediction.csv(for batch prediction, no price column)fake_cars_for_validation.csv(with price column for validation)
Move these files into the test data/ folder for use in the deployment UI.
- Modular codebase: Clean separation of preprocessing, feature engineering, and modeling.
- Supports both XGBoost and Neural Network models.
- Streamlit UI: For easy local deployment and user-friendly predictions.
- Batch prediction: Upload a CSV from
test data/and get predictions for all entries (test data required). - Fake data generation: For testing and demonstration.
- Do not retrain models on the deploy server; only use for inference.
- Make sure the input data columns match those expected by the models.
- For best results, use the same preprocessing and feature engineering pipeline for both training and inference.
For questions or contributions, please open an issue or pull request on the repository.