This repository contains an applied machine learning project for predicting crop yield using agricultural and environmental data. The project includes a training script, dataset, saved machine learning model, and a simple application interface for making crop yield predictions.
This project was developed as an applied machine learning and agritech project and is maintained as part of my machine learning project portfolio.
Crop yield prediction is an important task in agricultural planning, food security analysis, and farm decision-making. Yield can be influenced by multiple factors, including crop type, rainfall, temperature, pesticide usage, soil conditions, and other environmental or management-related variables.
Machine learning can be used to identify patterns in agricultural datasets and build predictive models that estimate expected crop yield from available input features.
In this project, a machine learning model is trained on crop yield data and saved for reuse in a simple prediction application.
The main objectives of this project are to:
- preprocess crop yield data
- train a machine learning model for yield prediction
- save the trained model for reuse
- build a simple application for prediction
- demonstrate an end-to-end agricultural machine learning workflow
Crop_Yield_Prediction/
├── app.py
├── crop_yield_data.csv
├── crop_yield_model.pkl
├── train_model.py
├── requirements.txt
├── .gitattributes
├── .gitignore
└── README.md
This file contains the application code for making crop yield predictions using the saved model. It loads the trained model and provides an interface where users can input crop or environmental variables and obtain a predicted yield value.
This is the dataset used for training and evaluating the crop yield prediction model. It contains crop-related and/or environmental variables used as input features for model development.
This is the saved trained machine learning model. It allows the application to make predictions without retraining the model every time.
This file contains the model training workflow. It loads the dataset, preprocesses the data, trains the model, evaluates performance, and saves the trained model as a .pkl file.
This file lists the Python packages required to run the project.
This file may be used for repository configuration, including file tracking settings such as Git LFS handling for larger files.
The project follows a standard machine learning workflow.
The crop yield dataset is loaded and inspected for:
- feature columns
- target variable
- missing values
- data types
- basic descriptive statistics
The dataset is prepared for model development through preprocessing steps such as:
- separating input features and target variable
- handling missing values where applicable
- encoding categorical variables where applicable
- scaling or transforming numerical variables where required
- splitting the dataset into training and testing sets
A machine learning regression model is trained to predict crop yield based on the available input features.
The model learns relationships between agricultural/environmental variables and crop yield outcomes.
After training, the model is saved as:
crop_yield_model.pkl
This allows the model to be reused by the prediction application.
The application loads the saved model and allows users to provide input values. The model then returns a predicted crop yield based on the input features.
Clone the repository:
git clone https://github.com/CodeeSam/Crop_Yield_Prediction.git
cd Crop_Yield_PredictionInstall the required dependencies:
pip install -r requirements.txtTo train or retrain the model, run:
python train_model.pyAfter training, the model will be saved as:
crop_yield_model.pkl
To run the application, use:
python app.pyIf the application was built with Streamlit, use:
streamlit run app.pyThe main Python packages used in this project may include:
pandas
numpy
scikit-learn
streamlit
pickle
Depending on the implementation, additional packages may also be required. See requirements.txt for the complete list.
Crop Yield Dataset → Data Preprocessing → Model Training → Saved Model → Prediction App
This repository represents an applied machine learning project in agricultural yield prediction. It is intended to demonstrate how machine learning can be used to build a simple predictive workflow for crop yield estimation.
The project should be interpreted as a demonstration and learning-based application, not as a production-level agricultural forecasting system.
Some limitations of this project include:
- The model performance depends on the size and quality of the dataset.
- The dataset may not represent all crop types, countries, climates, or farming systems.
- The model may not generalize well to regions or conditions not represented in the training data.
- The project does not include large-scale external validation.
- The application is intended for demonstration and educational purposes.
- Real-world crop yield forecasting would require more detailed agronomic, climatic, soil, and management data.
Possible future improvements include:
- adding stronger exploratory data analysis
- evaluating multiple regression models
- adding cross-validation
- reporting metrics such as MAE, RMSE, and R²
- adding feature importance analysis
- improving the user interface
- deploying the app online
- adding support for more crop types and regions
- incorporating weather, soil, fertilizer, and satellite-derived features
- organizing files into
data/,models/, andsrc/folders
This type of project can be useful as a starting point for:
- agricultural data science
- crop yield forecasting
- agritech machine learning applications
- regression modeling practice
- predictive analytics projects
- simple ML application deployment
Samson Ayorinde Oni
Machine Learning | Data Science | Computational Research
This repository is released under the MIT License.