Skip to content

tug-cps/AUTOLOGY-TimeSeriesClassification

Repository files navigation

AUTOLOGY - Building IoT Timeseries Classification

Predict class of unlabeled timeseries data from various IoT sources of different building data with minimal context

Usage Sequence

Note: We have used VS code ipykernel (VSCode Interactive Mode) for most of the Project.

  • The labelling tool was created for consistent, reproducible and fast labelling of time-series columns in datasets. Explanation and Examples in: Notebook
  • A brief example code on how to load data into DataManager class and how to visualise class/unit distributions of all time-series in all loaded datasets.
  • The preprocessing step consists of an sklearn.Pipeline at its core for flexible alterations of data transformation, feature extraction and debugging. DataManager class can be controlled with load_all_data to either load all Datasets from a specified data folder or only select a subset of dataset or test set. Timezones are mapped to each dataset, ensuring correct alignment of time-based data. Once the data is loaded, a series of processing steps are applied and data is split in weekly or daily segments. Then Features are extracted with numerous options.
  • The Debugger objects can be used to store and monitor the variables at different stages of preprocessing, allowing easy access to intermediate states of the data for further analysis, visualization or understanding.
  • A PCA transformer can be uncommented to plot the most significant features.
    For more information check source code and docstrings utils/data/preprocessing.py
  • (optional) dropping of underrepresented classes
  • sklearn.Pipeline with following steps:
    • (optional) dropping of features
    • TraintestSplit will avoid data leakage per time-series column (not per whole dataset)
    • (optional and unfinished) over/undersampling
    • Model Training
  • Followed by plotting functions and a function to log all training data to mlflow (set run_name!)
    For more information check source code and docstrings utils\training\models_and_training.py

ToDos

General

  • Restructure code to avoid copies/deepcopies (switch from running the code in ipykernel to terminal-based execution)

Labeling


Preprocessing

  • Load data
    • Load classes and units from separate classification support file
  • Split data into Chunks
    • Daily 24hr-cycles
    • Weekly 7 day cycles
  • Norm on common sampling rate (1hr)
  • Extract Features
    • Time periodicity values
    • Add holiday detection for weekly data features

Model Training

  • Implement basic models for benchmark with rule of thumb values
  • Add advanced models ANN, RNN, LSTM, etc.
    • Research further possible model approaches..

Model Evaluating

  • Test under-/oversampling against class imbalances.
  • Model Optimisation?
  • Visualise (Plots/Metrics):
    • Classification Report
    • Confusion Matrix
    • Feature Importance (XGBoost and RandomForest support it intrinsically)
    • Mismatches per Dataset (Done but not very pretty)
  • Create various plots to visualise performance

Notes

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors