Welcome to Conmo documentation!
Release v1.0.1.
What is Conmo?
Conmo is a framework developed in Python whose main objective is to facilitate the execution and comparison of different experiments mainly related to Anomaly Detection and Condition Monitoring. These experiments consist of a series of concatenated stages forming a pipeline architecture, i.e. the output of one stage is the input of the next one. This framework aims to provide a way to standarize machine learning experiments, thus being able to reconstruct result tables of scientific papers.
User Guide
Quickstart Guide
Requirements
Conmo was developed under Python version 3.7.11 so it should work with similar or more recent versions. However, we cannot claim this to be true, so we recommend using the same version. To be able to use Conmo you need to have installed a Python interpreter and the following libraries on your computer:
If you want to make a contribution by modifying code and documentation you need to include these libraries as well:
We suggest to create a new virtual enviroment using the Conda package manager and install there all dependences.
Installation
The fastest way to get started with Conmo is to install it via the pip command.
pip install conmo
And then you will be able to open a Python interpreter and try running.
import conmo
Some TensorFlow warnings might come up if your computer doesn’t have installed a GPU, although that’s not a problem for running Conmo.
You can also install Conmo manually downloading the source code from the Github repository.
git clone https://github.com/MyM-Uniovi/conmo.git
cd conmo
Then if you haven’t prepared manually a conda enviroment, you can execute the shell-script install_conmo_conda.sh
to install all the dependences and create a Conda enviroment with Python 3.7.
cd scripts
./install_conmo_conda.sh conda_env_name
If your operating system is MacOS, please, check Known issues & Limitations section for more information about compatibility of Conmo with Apple M1 and M2 CPUs.
If your operating system is not Unix-like and you are using Windows 10/11 OS you can create the Conda enviroment manually or use the Windows Subsytem for Linux (WSL) tool. For more information about its installation, please refer to Microsoft’s official documentation..
To check if the Conda enviroment is activated you should see a (conda_env_name)
in your command line. If it is not activated, then you can activated it using:
conda activate conda_env_name
Overview
The experiments in Conmo have a pipeline-based architecture. A pipeline consists of a chain of processes connected in such a way that the output of each element of the chain is the input of the next, thus creating a data flow. Each of these processes represents one of the typical generic steps in Machine Learning experiments:
- Datasets
Defines the dataset used in the experiment which will be the starting data of the chain. Here the dataset will be loaded and parsed to a standard format.
- Splitters
Typically in Machine Learning problems the data has to be splitted into train data and test data. Also here you can apply Cross-Validation techniques.
- Preprocesses
Defines the sequence of preprocesses to be applied over the dataset to manipulate the data before any algorithm is executed.
- Algorithms
Defines the different algorithms which will be executed over the same input data stream (as a result of the previous stage). It can be one or several.
- Metrics
Defines the different metrics that can be used to evaluate the results obtained from the algorithms.

Further details and documentation about modules, functions and parameters are provided in the API Reference.
Running an experiment
Here is a brief example on how to use the different comno modules to reproduce an experiment. In this case with the predefined splitter of the Server Machine Dataset, Sklearn’s MinMaxScaler as preprocessing, PCAMahalanobis as algorithm and accuracy as metric.
Import the module if it hasn’t been imported yet and other dependences:
1from sklearn.preprocessing import MinMaxScaler 2 3from conmo import Experiment, Pipeline 4from conmo.algorithms import PCAMahalanobis 5from conmo.datasets import ServerMachineDataset 6from conmo.metrics import Accuracy 7from conmo.preprocesses import SklearnPreprocess 8from conmo.splitters import SklearnSplitter 9from sklearn.model_selection import PredefinedSplit 10from sklearn.preprocessing import MinMaxScaler
Configure the different stages of the pipeline:
1dataset = ServerMachineDataset('1-01') 2splitter = SklearnSplitter(splitter=PredefinedSplit(dataset.sklearn_predefined_split())) 3preprocesses = [ 4 SklearnPreprocess(to_data=True, to_labels=False, 5 test_set=True, preprocess=MinMaxScaler()), 6] 7algorithms = [ 8 PCAMahalanobis() 9] 10metrics = [ 11 Accuracy() 12] 13pipeline = Pipeline(dataset, splitter, preprocesses, algorithms, metrics)
Create an experiment with the configured pipeline. The first parameter is a list of the pipelines that will be included in the experiment It can be one or more. The second parameter is for statistical testing between results, but this part is still under development and therefore it cannot be used:
1experiment = Experiment([pipeline], [])
Start running the experiment by calling
launch()
method:1experiment.launch()
As a result of the execution of the experiment a specific folder structure will be created in
~/conmo
:
/data
This directory contains the various datasets that have already been imported (downloaded and parsed) and are therefore already available for use. They are stored in parquet format for better compression. For each of the subdatasets included in each dataset, there will be a data file and a labels file.
/experiments
This directory contains all the executions of an experiment in Conmo in chronological order. Each directory corresponds to an experiment and has in its name a timestamp with the time and day when this experiment was run. Within each experiment directory there will be another one for each pipeline, and within this one there will be as many directories as the number of steps each pipeline has been determined to contain These folders contain the input and output data used by each step of the pipeline. They are also stored in parquet format, in the same way as the datasets in the
/data
folder.
Examples
Examples
A handful of example experiments can be found in the “examples” directory of the repository. These are listed below:
NASA TurboFan Degradation
This example can be found in nasa_cmapss.py file. The chosen dataset is NASA’s Turbofan engine degradation simulation data set. It is a dataset widely used in multivariate time series anomaly detection and condition monitoring problems. The splitter used is the Sklearn Predefined Split. For more information see the Scikit-Learn documentation. Regarding preprocessing, several have been used. The Savitzky-Golay filter, RUL Imputation and Binarizer are already implemented in Conmo. The MinMaxScaler is a Sklearn preprocessing (more information here) that has been packaged using SklearnPreprocess. Finally, two custom preprocesses for data cleaning and label renaming have been defined using the CustomPreprocess wrapper. To create these preprocesses just create a function that has as parameters the Pandas Dataframes for data and labels. The algorithms used were dimensionality reduction with PCA together with Mahalanobis distance calculation and One Class Support Vector Machine. Finally, the metric used was Acurracy.
1import pandas as pd
2from sklearn.model_selection import PredefinedSplit
3from sklearn.preprocessing import MinMaxScaler
4
5from conmo import Experiment, Pipeline
6from conmo.algorithms import OneClassSVM, PCAMahalanobis
7from conmo.datasets import NASATurbofanDegradation
8from conmo.metrics import Accuracy
9from conmo.preprocesses import (Binarizer, CustomPreprocess, RULImputation,
10 SavitzkyGolayFilter, SklearnPreprocess)
11from conmo.splitters import SklearnSplitter
12
13# First custom preprocess definition
14def data_cleanup(data: pd.DataFrame, labels: pd.DataFrame) -> (pd.DataFrame, pd.DataFrame):
15 # Reduce columns
16 columns = ['T30', 'T50', 'P30']
17 sub_data = data.loc[:, columns]
18
19 # Rename columns
20 sub_data = sub_data.rename(columns={'T50': 'TGT'})
21
22 # Calculate FF
23 sub_data.loc[:, 'FF'] = data.loc[:, 'Ps30'] * data.loc[:, 'phi']
24 sub_data.head()
25
26 return sub_data, labels
27
28# Second custom preprocess definition
29def rename_labels(data: pd.DataFrame, labels: pd.DataFrame) -> (pd.DataFrame, pd.DataFrame):
30 # Rename labels from 'rul' to 'anomaly'
31 labels.rename(columns={'rul': 'anomaly'}, inplace=True)
32
33 return data, labels
34
35
36# Select FD001 subdataset of NASA Turbofan Degradation dataset
37dataset = NASATurbofanDegradation(subdataset="FD001")
38
39# Split dataset using predefined dataset split
40splitter = SklearnSplitter(splitter=PredefinedSplit(dataset.sklearn_predefined_split()))
41
42# Preprocesses definition
43preprocesses = [
44 CustomPreprocess(data_cleanup),
45 SklearnPreprocess(to_data=True, to_labels=False,
46 test_set=True, preprocess=MinMaxScaler()),
47 SavitzkyGolayFilter(to_data=True, to_labels=False,
48 test_set=True, window_length=7, polyorder=2),
49 RULImputation(threshold=125),
50 Binarizer(to_data=False, to_labels=[
51 'rul'], test_set=True, threshold=50),
52 CustomPreprocess(rename_labels)
53]
54
55# Algorithms definiition with default parameters
56algorithms = [
57 PCAMahalanobis(),
58 OneClassSVM()
59]
60
61metrics = [
62 Accuracy()
63]
64# Pipeline with all steps
65pipeline = Pipeline(dataset, splitter, preprocesses, algorithms, metrics)
66
67# Experiment definition and launch
68experiment = Experiment([pipeline], [])
69experiment.launch()
Batteries Degradation
This experiment can be found in the file batteries_degradation.py and reproduces the results obtained in a paper to estimate the level of degradation of some types of lithium batteries. The dataset used is Batteries Degradation. This is not a time series, although it is somewhat similar since it measures different types of degradation in three types of batteries as they are gradually used. It is a local dataset, so it is necessary to pass the path in which it is located, and also the type of battery to be selected (LFP) and the test set, in this case 1. The splitter used is the Sklearn Predefined Split and it does not have any preprocessing since during the parsing of the local files to the Conmo format the data is already normalised. The algorithms used are the same as those used in the paper: Random Forest, Multilayer Perceptron and Convolutional Neural Network. In all cases the pre-trained models are used, so it is necessary to pass the path to the files as a parameter. The metric used is Root Mean Square Percentage Error.
1from conmo import Experiment, Pipeline
2from conmo.algorithms import PretrainedRandomForest, PretrainedCNN1D, PretrainedMultilayerPerceptron
3from conmo.datasets import BatteriesDataset
4from conmo.metrics import RMSPE
5from conmo.splitters import SklearnSplitter
6from sklearn.model_selection import PredefinedSplit
7
8# Pipeline definition
9# Change path to our local dataset files, specify chemistry of the batteries (LFP, NCA, NMC) and test set
10dataset = BatteriesDataset('/path/to/batteries/dataset/', 'LFP', 1)
11splitter = SklearnSplitter(splitter=PredefinedSplit(dataset.sklearn_predefined_split()))
12preprocesses = None
13# Changes the path to the files where the pre-trained models are stored (usually h5, h5py or joblib formats).
14algorithms = [
15 PretrainedRandomForest(pretrained=True, path='/path/to/saved/model-RF.joblib'),
16 PretrainedMultilayerPerceptron(pretrained=True, input_len=128, path='/path/to/saved/model-MLP.h5'),
17 PretrainedCNN1D(pretrained=True, input_len=128, path='/path/to/saved/model-CNN.h5')
18]
19metrics = [
20 RMSPE()
21]
22pipeline = Pipeline(dataset, splitter, preprocesses, algorithms, metrics)
23
24
25# Experiment definition and launch
26experiment = Experiment([pipeline], [])
27experiment.launch()
Server Machine Dataset with PCAMahalanobis
This experiment can be found in the file omni_anomaly_smd.py. The Server Machine Dataset used in this experiment has been obtained from the OmniAnomaly repository. In their Github you can find more information about the dataset as well as the implementation of other anomaly detection and time series data mining algorithms. The splitter used is the Sklearn Predefined Split and the preprocessing is the MinMaxScaler from Sklearn. The algorithms is PCA with Mahalanobis distance. Finally, the metric is the Accuracy.
1from sklearn.preprocessing import MinMaxScaler
2
3from conmo import Experiment, Pipeline
4from conmo.algorithms import PCAMahalanobis
5from conmo.datasets import ServerMachineDataset
6from conmo.metrics import Accuracy
7from conmo.preprocesses import SklearnPreprocess
8from conmo.splitters import SklearnSplitter
9from sklearn.model_selection import PredefinedSplit
10from sklearn.preprocessing import MinMaxScaler
11
12# Pipeline definition
13dataset = ServerMachineDataset('1-01')
14splitter = SklearnSplitter(splitter=PredefinedSplit(dataset.sklearn_predefined_split()))
15preprocesses = [
16 SklearnPreprocess(to_data=True, to_labels=False,
17 test_set=True, preprocess=MinMaxScaler()),
18]
19algorithms = [
20 PCAMahalanobis()
21]
22metrics = [
23 Accuracy()
24]
25pipeline = Pipeline(dataset, splitter, preprocesses, algorithms, metrics)
26
27
28# Experiment definition and launch
29experiment = Experiment([pipeline], [])
30experiment.launch()
API Reference
API Reference
This is the API Reference documentation of the package, including modules, classes and functions.
conmo.experiments
This is the main submodule of the package and it is the responsible of create the intermediary directories of the experiment and take care of creating and executing the configured pipeline.
|
|
|
conmo.datasets
The conmo.datasets
submodule takes care of downloading the dataset and parsing it to the Conmo’s format.
|
Abstract base class for a Dataset. |
|
Abstract base class for a RemoteDataset (downloadable). |
Abstract base class for a LocalDataset (loadable). |
|
|
|
|
|
|
This is a dataset obtained from measurements of certain types of degradation of three types of batteries. Since it belongs to the local datasets, to launch any experiment with it, it must be stored on disk with the following directory structure: - DTW-Li-ion-Diagnosis - data : Data and labels for the three types of batteries are stored here. - mat: - LFP: - diagnosis: - V.mat - test: - V_references.mat - x_test_0.mat - x_test_1.mat - x_test_2.mat - x_test_3.mat - y_test.mat - NCA: - diagnosis - test - NMC: - The same as NCA and LFP - Q.mat. |
conmo.splitters
Once the dataset has been loaded, it is necessary to separate the training and test parts. The conmo.splitters
submodule permits generate new splitters or use predefined ones from the Scikit-Learn library.
|
conmo.preprocesses
The aim of the conmo.preprocesses
submodule is to apply a series of transformations to the data set before it is used as input to the algorithms. Several types of preprocesses implemented are usually used in time series anomaly detection problems.
Abstract base class for a Preprocess. |
|
Specific class to implement preprocessing which consists of applying certain transformations on some columns of the dataset. |
|
|
|
Core class used to implement self-created preprocess. |
|
|
|
|
|
|
Class used to wrap existing preprocess in the Scikit-Learn library. |
conmo.algorithms
The conmo.algorithms
submodule contains everything related to algorithms in Conmo, from abstract classes to introduce new algorithms in Conmo to implementations of some of the algorithms used in the example experiments.
|
|
|
|
|
|
|
|
|
|
|
conmo.metrics
The conmo.metrics
submodule contains everything necessary to add new ways of measuring the effectiveness of the implemented algorithms. Accuracy and RMSPE are currently implemented.
|
|
|
Development Guide
Development guide
Possibilities of Conmo
Conmo framework has been designed to be user-friendly for recreating and evaluating experiments, but also for adding new algorithms, datasets, preprocesses, etc. This section explains the possibilities offered by this framework when implementing new submodules. We believe that using and contributing to Conmo can benefit all types of users, as well as helping to standardise comparisons between results from different scientific articles.
If you still have doubts about the implementation of new components to the framework, you can take a look at the API reference, examples or contact the developers.
Add a new dataset
Dataset
is the core abstract class for every dataset in Conmo and contains basic methods and attributes that are common for all datasets.
At the same time, two classes depend on it and differ acording to where the original data is stored:
LocalDataset
:Is the abstract class in charge of handling datasets that are stored locally on the computer where Conmo will be running. The main method of this class is
LocalDataset.load()
. It’s in charge of parsing the original dataset files to Conmo’s format and moving them to the data folder. It’s an abstract method wich needs to be implemented in every local dataset. There is also an abstract methodfeed_pipeline()
to copy selected data to pipeline step folder.
RemoteDataset
:In case the dataset to be implemented is originally located on a web server, a Git repository or other remote hosting, the RemoteDataset class is available in Conmo. Among all its methos, it’s remarkable the
RemoteDataset.download()
to download the dataset from a remote URL.
For adding a new local dataset to the framework you need to create a new class that inherits from LocalDataset
and override the following methods:
LocalDataset.__init__()
:This is the constructor of the class. Here you can call the constructor of the father class to assign the path to thw original dataset. Here you can also define some attributes of the class, like the label’s columns names, feature’s names. Also you can assign the subdataset that you want to instanciate.
LocalDataset.dataset_files()
:This method must return a list with all the files (data and labels) that compounds the dataset.
LocalDataset.load()
:This method must convert all raw dataset files to the appropriate format for Conmo’s pipeline. For each of the datasets, first read and load the data and labels into Pandas dataframes, then concatenate them (e.g. train data and test data will be concatenated in one dataframe, the same for test) and finally save them in parquet format. Some considerations to take into account:
Data and labels dataframes will have at least a multi-index for sequence and time. You can consult more information in the Pandas documentation.
The columns index must start at 1.
If there dataset is only splittered into train and test, then there will be 2 sequences, one per set.
In case the dataset is a time series with sequences, train sequences go after the test sequences.
LocalDataset.feed_pipeline()
:This method is used to copy the dataset from data directory to the directory Dataset of the experiment.
LocalDataset.sklearn_predefined_split()
:If you plan to use the Predefined Split from the Sklearn library your class must implement this method. It must generate an array of indexes of same length as sequences to be used with PredefinedSplit. The index must start at 0.
For adding a new remote dataset to the framework the procedure is almost identical to a local dataset. You need to create a new class that inherits from RemoteDataset
and override the following methods:
RemoteDataset.__init__()
:This is the constructor of the class. Here you can call the constructor of the father class to assign the path to thw original dataset. You can also define some attributes of the class, like the label’s columns names, features’s names. , file format, URL and checksum. Also you can assign the subdataset that you want to instanciate.
RemoteDataset.dataset_files()
:This method must return a list with all the files (data and labels) that compounds the dataset.
RemoteDataset.parse_to_package()
:Almost identical to
LocalDataset.load()
.
RemoteDataset.feed_pipeline()
:This method is used to pass the dataset from data directory to the directory Dataset of the experiment.
RemoteDataset.sklearn_predefined_split()
:If you plan to use the Predefined Split from the Sklearn library your class must implement this method. It must generate an array of indexes of same length as sequences to be used with PredefinedSplit. The index must start at 0.
Add a new algorithm
Conmo provides a core abstract class named Algorithm
that contains the basic methods for the operation of any algorithm, mainly training with a training set, performing a prediction over test, loading and saving input and output data.
Depending on the type of anomaly detection algorithm to be implemented, there are two classes depending on the operation of the method:
AnomalyDetectionThresholdBasedAlgorithm
:If your algorithm needs to calculate a threshold to determine which samples are anomalous it must inherit from this class. For example: PCA Mahalanobis.
AnomalyDetectionClassBasedAlgorithm
:If your algorithm identifies by classes the normal sequences from the anomalous ones, it must inherit from this class. For example: One Class SVM.
PretrainedAlgorithm
:Check out this class if your algorithm was pre-trained prior to running an experiment, i.e. it is not necessary to train it during the experiment. It is required to be able to define the path where the pre-trained model is stored on disk.
For adding a new algorithm to the framework you need to create a new class that inherits from one of these classes depending of the type of the algorithm and override the following methods:
__init__()
:Constructor of the class. Here you can initialize all the hyperparameters needed for the algorithm. Also you can fix random seeds of Tensorflow, Numpy, etc here for reproducibilty purposes.
fit_predict()
:Method responsible of building, training the model with the training data and testing it with the test set. In case your algorithm is threshold-based, it will be necessary to verify that each output in the test set exceeds that threshold to determine that it is anomalous. In the case of a class-based algorithm, depending on the output, it will be necessary to identify whether it is an outlier or an anomaly. Finally, the output dataframe has to be generated with the labels by sequence or by time.
find_anomaly_threshold()
:In case the algorithm is threshold based, the threshold selection can be customised overriding this method.
You can add auxiliary methods for model construction, weights loading, etc. in case the model structure is very complex.
Add a new splitter
The core abstract class is Splitter
and provides some methods to load inputs, save outputs and check it the input was already splittered.
For adding a new splitter you must create a new class that inherits from Splitter
and implements the method Transform()
.
If the splitters you want to implement is available on Scikit-Learn library, we provide the class SklearnSplitter
and indicating the name of the splitter to be used will allow you to use it in your experiment.
Add a new preprocess
ExtendedPreprocess
Class is used for the implementation of new preprocessings in the pipeline. ExtendedPreprocess
inherits from the core abstract class Preprocess
and provides a constructor in order to define which parts of the dataset will be modified by the preprocessing: labels, data, test or train. Also permits to apply the preprocess to a specyfic set of columns.
To define a new preprocess you only need to create a new class than inherits from ExtendedPreprocess
and implements the method Transform()
, where the preprocessing will be applied to th datset.
If the preprocess you want to implement is available on Sklearn library, we provide the class SklearnPreprocess
and indicating the name of the preprocessing to be used will allow you to use it in your experiment.
In order to make things easier, the CustomPreprocess
class is available to implement a preprocessing tool from a function, which will be passed as an argument in the constructor. For additional information you can have a look at the example nasa_cmapps.py.
Add a new metric
You can add a new metric by creating a new class that inherits from the abstract class Metric
.
The only method you have to take care is:
calculate()
:Based on the outputs of the algorithms and the number of folds, the results are computed and the metrics dataframe is created and stored.
CSV dataset import example
A very common use case that Conmo users may encounter is to add a new dataset that is stored in CSV format. For this case we have developed this small guide, which includes a template as an example. The dataset is stored locally so it will inherit from LocalDataset. It contains three subdatasets stored in different directories, in all of them there are CSV files for data and labels, both for train and test:

The template:
1import os
2import shutil
3from os import path
4from typing import Iterable
5
6import pandas as pd
7
8from conmo.conf import File, Index, Label
9from conmo.datasets.dataset import LocalDataset
10
11
12class CSV_Dataset(LocalDataset):
13 # ------------------------------------------------------------------------------------ #
14 # Define constants here ... #
15 # ------------------------------------------------------------------------------------ #
16 #
17 EX_CONST = 22
18 EX_SUBDATASETS = ['01', '02', '03']
19 EX_COL_NAMES = ['A', 'B', 'C']
20
21 # ------------------------------------------------------------------------------------ #
22 # Constructor of the class #
23 # Call super class constructor to pass path where the raw dataset is stored #
24 # Here you can initialize attributes with passed values #
25 # and the specific subbdataset to be used when instantiating #
26 # ------------------------------------------------------------------------------------ #
27 #
28
29 def __init__(self, path: str, subdataset: str) -> None:
30 super().__init__(path)
31 self.path = path
32 self.subdataset = subdataset
33
34 # ------------------------------------------------------------------------------------ #
35 # Loads the original CSV files to Pandas dataframes, #
36 # give them the appropriate format and finally save them to disk. #
37 # ------------------------------------------------------------------------------------ #
38 #
39 def load(self) -> None:
40 # SOME CONSIDERATIONS:
41 # - You can use Pandas utility read_csv()
42 # - Index must start at 1, not 0
43 # - Generate only 1 file for data and other for labels
44 # - Necessary a multi-index with two levels, an outer level of sequences and an inner level of sequences.
45 # - If there is both train and test data, each of them shall form a sequence.
46
47 # Iterate over files in the directory where the local original data is stored
48 for subdataset in os.listdir(self.path):
49 # ------------------------------------------------------------------------------------ #
50 # Read data CSV and generate dataframe
51 train_data = pd.read_csv(path.join(
52 self.path, subdataset, 'train_data.csv'), sep=',', header=None, names=self.EX_COL_NAMES)
53 test_data = pd.read_csv(path.join(
54 self.path, subdataset, 'test_data.csv'), sep=',', header=None, names=self.EX_COL_NAMES)
55
56 # Reset index for starting from 1 (Conmos format)
57 train_data.index += 1
58 test_data.index += 1
59
60 # Concatenate train and test data into 1 dataframe. (Always first train data)
61 # Time is and old name and needs to be upgraded but the purpose is the same as a normal Index
62 data = pd.concat([train_data, test_data], keys=[
63 1, 2], names=[Index.SEQUENCE, Index.TIME])
64
65 # Sort index after concatenate
66 data.sort_index(inplace=True)
67
68 # ------------------------------------------------------------------------------------ #
69 # Read labels CSV and generate dataframe
70 train_labels = pd.read_csv(path.join(
71 self.path, subdataset, 'train_labels.csv'), sep=',', header=None, names=[Label.ANOMALY])
72 test_labels = pd.read_csv(path.join(
73 self.path, subdataset, 'train_labels.csv'), sep=',', header=None, names=[Label.ANOMALY])
74
75 # Reset index for starting from 1 (Conmo's format)
76 train_labels.index += 1
77 test_labels.index += 1
78
79 # Concatenate train and test data into 1 dataframe. (Always first train data)
80 # Time is and old name and needs to be upgrade but the purpose is the same as a normal Index
81 labels = pd.concat([train_labels, test_labels], keys=[
82 1, 2], names=[Index.SEQUENCE, Index.TIME])
83
84 # Sort index after concatenate
85 labels.sort_index(inplace=True)
86
87 # ------------------------------------------------------------------------------------ #
88 # Finally save dataframes to disk in /home/{username}/conmo/data/... in parquet format
89 data.to_parquet(path.join(self.dataset_dir, '{}_{}'.format(
90 subdataset, File.DATA)), compression='gzip', index=True)
91 labels.to_parquet(path.join(self.dataset_dir, '{}_{}'.format(
92 subdataset, File.LABELS)), compression='gzip', index=True)
93
94 # ------------------------------------------------------------------------------------ #
95 # Method for adding to a list the different files that #
96 # belong to the dataset #
97 # Usually iterate over the subdatasets #
98 # ------------------------------------------------------------------------------------ #
99 #
100 def dataset_files(self) -> Iterable:
101 files = []
102 for key in self.EX_SUBDATASETS:
103 # Data
104 files.append(path.join(self.dataset_dir,
105 "{}_{}".format(key, File.DATA)))
106 # Labels
107 files.append(path.join(self.dataset_dir,
108 "{}_{}".format(key, File.LABELS)))
109 return files
110
111 # ------------------------------------------------------------------------------------ #
112 # Method for adding to pipeline step folder #
113 # Move from dataset_dir to out_dir data and labels #
114 # ------------------------------------------------------------------------------------ #
115 #
116 def feed_pipeline(self, out_dir: str) -> None:
117 # Data
118 shutil.copy(path.join(self.dataset_dir, "{}_{}".format(
119 self.subdataset, File.DATA)), path.join(out_dir, File.DATA))
120 # Labels
121 shutil.copy(path.join(self.dataset_dir, "{}_{}".format(
122 self.subdataset, File.LABELS)), path.join(out_dir, File.LABELS))
123
124 # ------------------------------------------------------------------------------------ #
125 # OPTIONAL: Only implement if you plan to use #
126 # PredefinedSplit method of Scikit-Learn library. #
127 # Returns indexes of sequences: #
128 # -1 -> if the sequence will be excluded on test set #
129 # 0 -> Test set #
130 # ------------------------------------------------------------------------------------ #
131 #
132 def sklearn_predefined_split(self) -> Iterable[int]:
133 return [-1, 0]
Once the class is ready, the respective import has to be added to the __init__ file and the class name to the __all__ list as follows:
1from conmo.datasets.mars_science_laboratory_mission import MarsScienceLaboratoryMission
2from conmo.datasets.nasa_turbofan_degradation import NASATurbofanDegradation
3from conmo.datasets.server_machine_dataset import ServerMachineDataset
4from conmo.datasets.soil_moisture_active_passive_satellite import SoilMoistureActivePassiveSatellite
5from conmo.datasets.batteries_degradation import BatteriesDataset
6#----------------------------------------
7# Add import to __init__ file of the module
8from conmo.datasets.csv_dataset import CSV_Dataset
9#----------------------------------------
10
11__all__ = [
12 'NASATurbofanDegradation'
13 'ServerMachineDataset'
14 'SoilMoistureActivePassiveSatellite'
15 'MarsScienceLaboratory'
16 'BatteriesDataset'
17 #----------------------------------------
18 # Add class name here
19 'CSV_Dataset'
20 #----------------------------------------
21]
Finally the dataset is ready to be used in an experiment:
1import os
2import sys
3
4# Add package to path (Uncomment only in case you have downloaded Conmo from github repository)
5sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
6
7from sklearn.preprocessing import MinMaxScaler
8
9from conmo.experiment import Experiment, Pipeline
10from conmo.algorithms import OneClassSVM
11from conmo.datasets import CSV_Dataset
12from conmo.metrics import Accuracy
13from conmo.preprocesses import SklearnPreprocess
14from conmo.splitters import SklearnSplitter
15from sklearn.model_selection import PredefinedSplit
16from sklearn.preprocessing import MinMaxScaler
17
18# Pipeline definition
19dataset = CSV_Dataset('/home/lucas/conmo_test_csv', '01')
20splitter = SklearnSplitter(splitter=PredefinedSplit(dataset.sklearn_predefined_split()))
21preprocesses = [
22 SklearnPreprocess(to_data=True, to_labels=False,
23 test_set=True, preprocess=MinMaxScaler()),
24]
25algorithms = [
26 OneClassSVM()
27]
28metrics = [
29 Accuracy()
30]
31pipeline = Pipeline(dataset, splitter, preprocesses, algorithms, metrics)
32
33
34# Experiment definition and launch
35experiment = Experiment([pipeline], [])
36experiment.launch()
Coding conventions
The following tools are used to ensure that new software being added to Conmo meets minimum quality and format requirements:
Autopep8: We use this tool to automatically format our Python code to conform to the PEP 8 style guide. It uses the pycodestyle utility to determine what parts of the code needs to be formatted.
Isort: We use this library to sort imports alphabetically, and automatically separated into sections and by type.
Pytest: To ensure that the output format of a new designed step (algorithm, dataset, etc) is correct we use Pytest framework to testing the new code. This tetsing frameworks is easy to use and supoort complex testing at the same. At the moment we are finishing the implementation of tests on the existing code, so there could be parts that may be modified in future updates.
Known Issues & Limitations
Known Issues & Limitations
We are aware that Conmo is still in a very early stage of development, so it is likely that as its use increases, various bugs will appear. Bugs that are detected will be published on this page in order to make it easier for users to prevent them. However, the Conmo development team is actively looking for and fixing any detected bugs. Please, if you find a bug/issue that does not appear on this list, we would be grateful if you could email us at mym.inv.uniovi@gmail.com or post an issue on our Github. Thanks in advance.
Issue ID |
Severity |
Description |
---|---|---|
001_split |
Low |
There are some problems with the use of Scikit-Learn’s Time Series Splitter in the experiments. We are working on resolving them. |
002_rul |
Medium |
rul_rve.py example seems to be failing during the metric calculation step. |
003_tf |
Medium |
If your computer has one of the new Apple processors (M1 or M2) with ARM-based architecture, it is likely that when you try to use Conmo, the Tensorflow dependency will fail. To fix this temporarily you can install Conmo without dependencies: ‘pip install –no-deps conmo’ and then manually install the branch provided by Google for ARM architectures tensorflow-macos. |
Frequently Asked Questions
Frequently Asked Questions
How can I contribute to Conmo?
Depending on your profile and your intended use, you can contribute in different ways: The simplest way to contribute to Conmo is to use it to reproduce some experiments and then cite it. However, you can also contribute by implementing new algorithms, datasets, etc. that can then be used by everyone to perform experiments. Finally, reporting bugs in the functioning of Conmo can also be considered a way to collaborate with the project.
I don’t have a great knowledge of programming, can I still use Conmo?
Conmo intends to focus on all types of scientists, regardless of their specialisation. Generally speaking, we can distinguish two types of people who will use Conmo:
People who only want to reproduce experiments that are already integrated only need basic programming knowledge, since Python is a simple programming language and the complexity has been largely encapsulated.
People who want to collaborate by adding new algorithms, datasets, etc. need more in-depth programming knowledge. In particular Python language and object-oriented programming. However, the Conmo development team is actively looking for a way to simplify this kind of actions.