AUTOLOGY - Building IoT Timeseries Classification

Predict class of unlabeled timeseries data from various IoT sources of different building data with minimal context

Usage Sequence

Note: We have used VS code ipykernel (VSCode Interactive Mode) for most of the Project.

Labelling Tool:

The labelling tool was created for consistent, reproducible and fast labelling of time-series columns in datasets. Explanation and Examples in: Notebook

Plotting Class Distribution

A brief example code on how to load data into DataManager class and how to visualise class/unit distributions of all time-series in all loaded datasets.

Preprocessing

The preprocessing step consists of an sklearn.Pipeline at its core for flexible alterations of data transformation, feature extraction and debugging. DataManager class can be controlled with load_all_data to either load all Datasets from a specified data folder or only select a subset of dataset or test set. Timezones are mapped to each dataset, ensuring correct alignment of time-based data. Once the data is loaded, a series of processing steps are applied and data is split in weekly or daily segments. Then Features are extracted with numerous options.
The Debugger objects can be used to store and monitor the variables at different stages of preprocessing, allowing easy access to intermediate states of the data for further analysis, visualization or understanding.
A PCA transformer can be uncommented to plot the most significant features.
For more information check source code and docstrings utils/data/preprocessing.py

Model Training & Evaluation

(optional) dropping of underrepresented classes
sklearn.Pipeline with following steps:
- (optional) dropping of features
- TraintestSplit will avoid data leakage per time-series column (not per whole dataset)
- (optional and unfinished) over/undersampling
- Model Training
Followed by plotting functions and a function to log all training data to mlflow (set run_name!)
For more information check source code and docstrings utils\training\models_and_training.py

ToDos

General

Restructure code to avoid copies/deepcopies (switch from running the code in ipykernel to terminal-based execution)

Labeling

Preprocessing

Load data
- Load classes and units from separate classification support file
Split data into Chunks
- Daily 24hr-cycles
- Weekly 7 day cycles
Norm on common sampling rate (1hr)
Extract Features
- Time periodicity values
- Add holiday detection for weekly data features

Model Training

Implement basic models for benchmark with rule of thumb values
Add advanced models ANN, RNN, LSTM, etc.
- Research further possible model approaches..

Model Evaluating

Test under-/oversampling against class imbalances.
Model Optimisation?
Visualise (Plots/Metrics):
- Classification Report
- Confusion Matrix
- Feature Importance (XGBoost and RandomForest support it intrinsically)
- Mismatches per Dataset (Done but not very pretty)
Create various plots to visualise performance

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
utils		utils
.gitignore		.gitignore
A_labelling.py		A_labelling.py
B_show_class_distributions.py		B_show_class_distributions.py
C_preprocessing.py		C_preprocessing.py
D_model-training.py		D_model-training.py
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
labelling_usage.ipynb		labelling_usage.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AUTOLOGY - Building IoT Timeseries Classification

Predict class of unlabeled timeseries data from various IoT sources of different building data with minimal context

Usage Sequence

Labelling Tool:

Plotting Class Distribution

Preprocessing

Model Training & Evaluation

ToDos

General

Labeling

Datasets

For More Datasets:

Preprocessing

Model Training

Model Evaluating

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AUTOLOGY - Building IoT Timeseries Classification

Predict class of unlabeled timeseries data from various IoT sources of different building data with minimal context

Usage Sequence

Labelling Tool:

Plotting Class Distribution

Preprocessing

Model Training & Evaluation

ToDos

General

Labeling

Datasets

For More Datasets:

Preprocessing

Model Training

Model Evaluating

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages