conmo.datasets.ServerMachineDataset

class conmo.datasets.ServerMachineDataset(subdataset: str)[source]
__init__(subdataset: str) None[source]

Main constructor of the class.

Parameters

name (str) – The name given to the dataset.

check_checksum(response: object) bool

Checks if the checksum of the downloaded file corresponds to the one provided in the class. For security e integrity issues. Currently only the md5 algorithm is integrated.

Parameters

response (Object) – Response object returned by the get method of the Requests library.

Return type

Boolean variable indicating whether the comparison of the hash with the checksum was successful or not.

dataset_files() Iterable[source]

Iterable of files included in the dataset.

download(out_dir: str) None

Download a Dataset from a remote URL.

extract_data(response: object, out_dir: str) None

Extracts the contents of a compressed file in zip format.

Parameters
  • response (Object) – Response object returned by the get method of the Requests library.

  • out_dir (str) – Directory were the zip file will be unzziped.

feed_pipeline(out_dir: str) None[source]

Copy selected data file to pipeline step folder.

fetch(out_dir: str) None

Fetch data to feed the pipeline.

Parameters

out_dir (str) – Directory where the dataset will be stored.

is_dataset_ready() bool

Check if dataset has been already loaded/downloaded and parsed to package format.

parse_to_package(raw_dir: str) None[source]

Parse raw dataset to package format. Data and labels must be saved in parquet format. More information about parquet format: https://parquet.apache.org/

Parameters

raw_dir – Directory where the dataset was downloaded from its source.

show_start_message() None

Show starting step info message.

sklearn_predefined_split() Iterable[int][source]

Generates array of indexes of same length as sequences to be used with ‘PredefinedSplit’ SMD dataset has only 2 sequences: one for train and another for test.

Returns

List with the index for each sequence of the dataset.

Return type

array

Methods

__init__(subdataset)

Main constructor of the class.

check_checksum(response)

Checks if the checksum of the downloaded file corresponds to the one provided in the class.

dataset_files()

Iterable of files included in the dataset.

download(out_dir)

Download a Dataset from a remote URL.

extract_data(response, out_dir)

Extracts the contents of a compressed file in zip format.

feed_pipeline(out_dir)

Copy selected data file to pipeline step folder.

fetch(out_dir)

Fetch data to feed the pipeline.

is_dataset_ready()

Check if dataset has been already loaded/downloaded and parsed to package format.

parse_to_package(raw_dir)

Parse raw dataset to package format.

show_start_message()

Show starting step info message.

sklearn_predefined_split()

Generates array of indexes of same length as sequences to be used with 'PredefinedSplit' SMD dataset has only 2 sequences: one for train and another for test.

Attributes

CHECKSUM

CHECKSUM_FORMAT

FILE_FORMAT

LABEL

SEQUENCE_COLUMN

SUBDATASETS

TIME_COLUMN

URL

VARIABLES