Examples

A handful of example experiments can be found in the “examples” directory of the repository. These are listed below:

NASA TurboFan Degradation

This example can be found in nasa_cmapss.py file. The chosen dataset is NASA’s Turbofan engine degradation simulation data set. It is a dataset widely used in multivariate time series anomaly detection and condition monitoring problems. The splitter used is the Sklearn Predefined Split. For more information see the Scikit-Learn documentation. Regarding preprocessing, several have been used. The Savitzky-Golay filter, RUL Imputation and Binarizer are already implemented in Conmo. The MinMaxScaler is a Sklearn preprocessing (more information here) that has been packaged using SklearnPreprocess. Finally, two custom preprocesses for data cleaning and label renaming have been defined using the CustomPreprocess wrapper. To create these preprocesses just create a function that has as parameters the Pandas Dataframes for data and labels. The algorithms used were dimensionality reduction with PCA together with Mahalanobis distance calculation and One Class Support Vector Machine. Finally, the metric used was Acurracy.

 1import pandas as pd
 2from sklearn.model_selection import PredefinedSplit
 3from sklearn.preprocessing import MinMaxScaler
 4
 5from conmo import Experiment, Pipeline
 6from conmo.algorithms import OneClassSVM, PCAMahalanobis
 7from conmo.datasets import NASATurbofanDegradation
 8from conmo.metrics import Accuracy
 9from conmo.preprocesses import (Binarizer, CustomPreprocess, RULImputation,
10                                SavitzkyGolayFilter, SklearnPreprocess)
11from conmo.splitters import SklearnSplitter
12
13# First custom preprocess definition
14def data_cleanup(data: pd.DataFrame, labels: pd.DataFrame) -> (pd.DataFrame, pd.DataFrame):
15    # Reduce columns
16    columns = ['T30', 'T50', 'P30']
17    sub_data = data.loc[:, columns]
18
19    # Rename columns
20    sub_data = sub_data.rename(columns={'T50': 'TGT'})
21
22    # Calculate FF
23    sub_data.loc[:, 'FF'] = data.loc[:, 'Ps30'] * data.loc[:, 'phi']
24    sub_data.head()
25
26    return sub_data, labels
27
28# Second custom preprocess definition
29def rename_labels(data: pd.DataFrame, labels: pd.DataFrame) -> (pd.DataFrame, pd.DataFrame):
30    # Rename labels from 'rul' to 'anomaly'
31    labels.rename(columns={'rul': 'anomaly'}, inplace=True)
32
33    return data, labels
34
35
36# Select FD001 subdataset of NASA Turbofan Degradation dataset
37dataset = NASATurbofanDegradation(subdataset="FD001")
38
39# Split dataset using predefined dataset split
40splitter = SklearnSplitter(splitter=PredefinedSplit(dataset.sklearn_predefined_split()))
41
42# Preprocesses definition
43preprocesses = [
44    CustomPreprocess(data_cleanup),
45    SklearnPreprocess(to_data=True, to_labels=False,
46                    test_set=True, preprocess=MinMaxScaler()),
47    SavitzkyGolayFilter(to_data=True, to_labels=False,
48                        test_set=True, window_length=7, polyorder=2),
49    RULImputation(threshold=125),
50    Binarizer(to_data=False, to_labels=[
51                    'rul'], test_set=True, threshold=50),
52    CustomPreprocess(rename_labels)
53]
54
55# Algorithms definiition with default parameters
56algorithms = [
57    PCAMahalanobis(),
58    OneClassSVM()
59]
60
61metrics = [
62    Accuracy()
63]
64# Pipeline with all steps
65pipeline = Pipeline(dataset, splitter, preprocesses, algorithms, metrics)
66
67# Experiment definition and launch
68experiment = Experiment([pipeline], [])
69experiment.launch()

Batteries Degradation

This experiment can be found in the file batteries_degradation.py and reproduces the results obtained in a paper to estimate the level of degradation of some types of lithium batteries. The dataset used is Batteries Degradation. This is not a time series, although it is somewhat similar since it measures different types of degradation in three types of batteries as they are gradually used. It is a local dataset, so it is necessary to pass the path in which it is located, and also the type of battery to be selected (LFP) and the test set, in this case 1. The splitter used is the Sklearn Predefined Split and it does not have any preprocessing since during the parsing of the local files to the Conmo format the data is already normalised. The algorithms used are the same as those used in the paper: Random Forest, Multilayer Perceptron and Convolutional Neural Network. In all cases the pre-trained models are used, so it is necessary to pass the path to the files as a parameter. The metric used is Root Mean Square Percentage Error.

 1from conmo import Experiment, Pipeline
 2from conmo.algorithms import PretrainedRandomForest, PretrainedCNN1D, PretrainedMultilayerPerceptron
 3from conmo.datasets import BatteriesDataset
 4from conmo.metrics import RMSPE
 5from conmo.splitters import SklearnSplitter
 6from sklearn.model_selection import PredefinedSplit
 7
 8# Pipeline definition
 9# Change path to our local dataset files, specify chemistry of the batteries (LFP, NCA, NMC) and test set
10dataset = BatteriesDataset('/path/to/batteries/dataset/', 'LFP', 1)
11splitter = SklearnSplitter(splitter=PredefinedSplit(dataset.sklearn_predefined_split()))
12preprocesses = None
13# Changes the path to the files where the pre-trained models are stored (usually h5, h5py or joblib formats).
14algorithms = [
15    PretrainedRandomForest(pretrained=True, path='/path/to/saved/model-RF.joblib'),
16    PretrainedMultilayerPerceptron(pretrained=True, input_len=128, path='/path/to/saved/model-MLP.h5'),
17    PretrainedCNN1D(pretrained=True, input_len=128, path='/path/to/saved/model-CNN.h5')
18]
19metrics = [
20    RMSPE()
21]
22pipeline = Pipeline(dataset, splitter, preprocesses, algorithms, metrics)
23
24
25# Experiment definition and launch
26experiment = Experiment([pipeline], [])
27experiment.launch()

Server Machine Dataset with PCAMahalanobis

This experiment can be found in the file omni_anomaly_smd.py. The Server Machine Dataset used in this experiment has been obtained from the OmniAnomaly repository. In their Github you can find more information about the dataset as well as the implementation of other anomaly detection and time series data mining algorithms. The splitter used is the Sklearn Predefined Split and the preprocessing is the MinMaxScaler from Sklearn. The algorithms is PCA with Mahalanobis distance. Finally, the metric is the Accuracy.

 1from sklearn.preprocessing import MinMaxScaler
 2
 3from conmo import Experiment, Pipeline
 4from conmo.algorithms import PCAMahalanobis
 5from conmo.datasets import ServerMachineDataset
 6from conmo.metrics import Accuracy
 7from conmo.preprocesses import SklearnPreprocess
 8from conmo.splitters import SklearnSplitter
 9from sklearn.model_selection import PredefinedSplit
10from sklearn.preprocessing import MinMaxScaler
11
12# Pipeline definition
13dataset = ServerMachineDataset('1-01')
14splitter = SklearnSplitter(splitter=PredefinedSplit(dataset.sklearn_predefined_split()))
15preprocesses = [
16    SklearnPreprocess(to_data=True, to_labels=False,
17                    test_set=True, preprocess=MinMaxScaler()),
18]
19algorithms = [
20    PCAMahalanobis()
21]
22metrics = [
23    Accuracy()
24]
25pipeline = Pipeline(dataset, splitter, preprocesses, algorithms, metrics)
26
27
28# Experiment definition and launch
29experiment = Experiment([pipeline], [])
30experiment.launch()