MWCD: Multi-Window Causal Discovery Framework for Rock Slope Stability Analysis.
In order to run the example, first install the required packages, then run the Example using the provided example dataset.
To customise the MWCD, use any valid combination of command line options.
If you use your own dataset, the following requirements have to be met:
- data has to be provided in a CSV file with the first row containing the variable names. See the example dataset for reference.
- (in this version) augmentation has to be done before running the MWCD algorithm, i.e. there is no automated computation of Freeze-Thaw-Cycles or negative-degree days currently implemented. All variables must be supplied as columns in the csv file.
- no pre-scaling required, the MWCD normalises all inputs
To install the required Python packages, run pip install -r requirements.txt to install the packages specified in the requirements.txt file. Furthermore, the gcastle package is set to use PyTorch as backend. Install PyTorch using the PyTorch Get Started instructions.
To run the example, execute python main.py. The main script will load the example dataset from /data and use DAG-GNN for Causal Discovery. If you would like to tweak the algorithm or change the Causal Discovery algorithm, use any of the options specified under Options.
The MWCD offers a variety of options to adapt it to your dataset. Additionally, it is possible to pass a list of options to test with all possible combinations. The options are read from the config.toml file, which is a file in TOML format describing the configuration used to run the MWCD.
All available options are listed below.
Options to configure the causal discovery. Options are listed under [causal-discovery]
There are three different Causal Discovery algorithms available to run MWCD that can be selected by adding them as strings to the algorithms list. Available algorithms:
- DAG-GNN
- NoTearsNonLinear
- DirectLiNGAM
- PCMCI
- LPCMCI
Example to run the MWCD with DAG-GNN and again with NoTearsNonLinear:
[causal-discovery]
algorithms = ["DAG-GNN", "NoTearsNonLinear"]The number of lags for the shifting before temporal causal discovery can be set using the option for lags.
Example to run the MWCD with 7 sample shifts and 14 sample shifts:
[causal-discovery]
lags = [7, 14]The MWCD uses Linearly penalized segmentation for Change Point Detection Ruptures Documentation. Change Point Detection is listed under [change-point-detection]
A minimum window size can be specified for the change point detection. The default value is 30 points. This should be adapted to the resolution of the dataset as it only specifies the minimum distance between change points in number of samples. Multiple window-sizes can be added to run in one execution to the window-sizes option.
Example to run the MWCD with two different window sizes, one of 14 and of of 30 days:
[change-point-detection]
window-sizes = [14, 30]To return the optimal breakpoints the signal is fit, where the fitting requires a penalty. The default penalty used in the MWCD is 2. To specify a different penalty, use the option penalty.
Example to run the MWCD Change Point Detection with a penalty of 5:
[change-point-detection]
penalty = 5PELT can be used with different segment models. Available models are "l1", "l2" and "rbf". The default model is "rbf". To use a different model, use the option model with the model name.
Example to run the MWCD Change Point Detection with the l2 model:
[change-point-detection]
model = "l2"Options related to the dataset are listed under [data]
To use a different dataset, provide the path under the option path. The path should be relative to the main script, i.e. if placed in the data folder: ../data/mydata.csv.
Example to run the MWCD with data from the file my_data.csv in the data folder:
[data]
path = "../data/my_data.csv"To specify a target variable for the change point detection, use the option target. The target name must correspond to one of the column names in the dataset file.
For example, to use block4a with the example dataset:
[data]
path = "../data/stampa.csv"
target = "block4a"To specify how to scale the data before passing it to the causal discovery algorithm, you may modify the option scaling. Available scalings are:
- "perc_range": scales all variables to a range of [0,1]
- "ngperc_range": scales all variables to a range of [-1,1]
- "full_range": scales all variables to a range of [0,100]
- "ngfull_range": scales all variables to a range of [-100,100] We recommend the usage of the standard "perc_range" scaling, which is the default scaling.
For example, to use "full_range" scaling with the example dataset:
[data]
scaling = "full_range"We recommend to use displacement, soil moisture and infiltration differences instead of the original time series. If included in the dataset, they should labeled accordingly. If use-differences is set to true, the MWCD will automatically look for columns that contain the string "diff" and perform the causal discovery on these columns accordingly. To add columns to the causal discovery regardless of their naming, they have to be specified under always-include.
If use-differences is set to false, all columns that do not contain the keyword "diff" will be used for the causal discovery.
For example, to set use-differences to false:
[data]
use-differences = [false]Since this option is also provided as a list, it is possible to add both options, true and false, to the list and run the MWCD once with differences and once without.
The list specified under always-include must contain the column names from the dataset that should be considered in the causal discovery regardless of the value in use-differences.
For example, to always include the columns named precipitation, temperature and frost:
[data]
always-include = [
"precipitation",
"temperature",
"frost"
]Under [misc], a collection of useful options is provided.
The option verbose can be set to true to provide output on the command line. Alternatively, the output can be suppressed setting it to false.
For example, to set verbose to false:
[misc]
verbose = falseThe option label enables you to change the naming scheme for the result files. These files are stored as .npy files with the following naming scheme:
<label>-cm-<MWCD part>-<causal discovery algorithm>.npy
where cm stands for Causal Matrix.
For example to set the label to my_data_test:
For example, to set verbose to false:
[misc]
label = "my_data_test"This example would generate 4 files in the results folder when used with only DAG-GNN:
my_data_test-cm-all-DAG-GNN.npymy_data_test-cm-sections-DAG-GNN.npymy_data_test-cm-temporal-all-DAG-GNN.npymy_data_test-cm-temporal-sections-DAG-GNN.npy
The output of the MWCD is per default a collection of adjacency matrices. These matrices are in a folder named results which is automatically created if it does not already exist. We provide utility functions to plot causal graphs from the Causal Discovery matrix output in the file src/utils.py.
