Skip to content

StefanoDamato/TweedieGP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TweedieGP

Forecasting Intermittent Time Series with Gaussian Processes and Tweedie Likelihood 🚀

TweedieGP is a probabilistic forecasting model for intermittent time series. The model is a Gaussian Process (GP) model with a Tweedie likelihood. NegBinGP is a GP model with a negative binomial likelihood.

This is a complement to the paper "Forecasting Intermittent Time Series with Gaussian Processes and Tweedie Likelihood" submitted for review at Interational Journal of Forecasting (IJF).

This repository contains:

  1. files to recreate the conda (for Python) and R environments with all external packages needed to reproduce our experiments and run the tutorial;
  2. the code to download and pre-process the data in the same way done in our paper;
  3. a tutorial that shows how to use TweedieGP;
  4. the code to reproduce the experiments in the paper.

Example of TweedieGP on 1000th time series of Auto dataset

Citation

If you use this code or our paper in your research, please cite it as follows:

Damato, S., Azzimonti, D., & Corani, G. (2025) Forecasting intermittent time series with Gaussian Processes and Tweedie likelihood. International Journal of Forecasting, In Press.

BibTeX (click to expand)
@article{DAMATO2025,
title = {Forecasting intermittent time series with Gaussian Processes and Tweedie likelihood},
journal = {International Journal of Forecasting},
year = {2025},
issn = {0169-2070},
doi = {https://doi.org/10.1016/j.ijforecast.2025.10.001},
author = {Stefano Damato and Dario Azzimonti and Giorgio Corani},
keywords = {Intermittent time series, Gaussian Processes, Tweedie distribution, Probabilistic forecasting, Quantile loss, Machine learning, Bayesian methods, Supply chain}
}

Usage

1. Install the environments

You can install all the required packages by creating the conda environment TweedieGP. You can create the environment by running the following command (requires a conda installation) on a terminal inside this folder.

conda env create -f environment.yml

The R environment is based on the renv package, which can be installed running install.packages("renv") in R. To import the packages, run

renv::restore()

2. Data

The data used in the experiments is publicly available: some datasets are made available with python API and others are available inside R packages. The following instructions require both a working R installation and the conda environment TweedieGP installed in step 1.

The folder data contains two files: datasets.R and datasets.ipynb. In order to download and pre-process the datasets used in the paper:

  1. Open an R session with working directory in data and run the R file datasets.R. Note that you might be required to install some R packages.
  2. Run the Jupyter notebook datasets.ipynb within the environment TweedieGP.

This should populate the folder data with one folder named after each dataset. Each folder will contain the following files: data.csv, test.json, train.json. The experiments will use the .csv version, in which time series are stored row-wise.

3. Tutorial on TweedieGP

In the folder tutorial you will find the Jupyter notebook tutorial.ipynb which runs TweedieGP and reproduces Figure 1 in the main paper.

If the conda environment TweedieGP is active, the Jupyter notebook can be run directly from start to finish. The notebook also contains basic explanations on how to

  • instanciate a TweedieGP/NegBinGP GPyTorch model;
  • train the model;
  • forecast on test data.

4. Reproducibility of the experiments

The folder src contains the code required to reproduce the experiments. Set the working directory in the project folder; all the results will be saved in a folder named trained_models. Let DATASET be the dataset name.

To run our model TweedieGP, based on GPyTorch, run the following command from terminal (using the conda environment introduced above):

python3 src/tweediegp/main.py -log --dataset DATASET --likelihood tweedie --scaling median-demand 

Similarly, to run NegBinGP use:

python3 src/tweediegp/main.py -log --dataset DATASET --likelihood negbin

Additional parameters can be specified, like the amount inducing points, the kernel choice, or the minimum number of training iterations. To see them, one can either see the arguments of the parser at src/tweediegp/main.py, or see the documentation of the intermittentGP class at src/tweediegp/intermittent_gp.py. When the negative binomial likelihood is used (--likelihood negbin), the scaling parameter must not be included.

The baselines in our experiments are also reproducible:

  • to compute the empirical quantiles and the zero forecast, run the Jupyter notebbok src/baselines/benchmarks.ipynb, changing the dataset name in the second cell..
  • to use iETS, run from terminal Rscript src/baselines/iETS.R --dataset_name DATASET.
  • to use the WSS model, run from terminal Rscript src/baselines/counter_models.R --dataset_name DATASET --model_name wss.
  • to use ADIDA, run from terminal python3 src/baselines/statsforecast_models.py --dataset_name DATASET --model adida.

The results of each model run on each dataset will be saved in a unique folder inside the trained_models folder. The result folder will use the following naming convention: dataset, the model itself, additional informations, and datetime. For example DATASET__TweedieGP__additionalInfo__datetime. Such folder will contain the json file experiment.json, containing some informations about the experiment, and the forecasts folder, containing the predictions saved as numpy arrays. In the experiments involving Gaussian Processes, the state dictionaries (from Torch) containing the parameters of the GP and the likelihood of the trained models for each time series will be stored as .pth files in the state_dicts folder.

The replication of these experiments and particularly those related to Gaussian Processes requires prolonged times. Running them locally is not recommended, especially for larger datasets with longer time series. However, it is possible: all the experiments reported in the paper have been run on the Apple M3 CPU of a MacBook Pro laptop, taking approximately one week.

Plots and tables reproducibility

We also include a jupyter notebook src/plots_and_tables.ipynb, which can be used to generate all the tables included in the paper. Runnig the first cell is required to import all the packages.

Then, for each plot or table, the dedicated cell can be run (independently from the others) to reproduce the result. Generated figures will be saved in the figures folder.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors