conmo.datasets.SoilMoistureActivePassiveSatellite

class conmo.datasets.SoilMoistureActivePassiveSatellite(channel: str)[source]

__init__(channel: str) → None[source]

Main constructor of the class.

Parameters

name (str) – The name given to the dataset.

check_checksum(response: object) → bool

Checks if the checksum of the downloaded file corresponds to the one provided in the class. For security e integrity issues. Currently only the md5 algorithm is integrated.

Parameters

response (Object) – Response object returned by the get method of the Requests library.

Return type

Boolean variable indicating whether the comparison of the hash with the checksum was successful or not.

check_checksum_lbl(response: object, checksum: str) → bool[source]

Checks if the checksum of the downloaded file corresponds to the one provided in the class. For security e integrity issues. Currently only the md5 algorithm is integrated. Since in the SMAP dataset the labels are obtained from a different file, it’s necessary to use another method to pass the checksum of that file.

Parameters

response (object) – Response object returned by the get method of the Requests library.

checksum (str) – String containing the labels’ checksum.

Returns

Boolean variable indicating whether the comparison of the hash with the checksum was successful or not.

Return type

bool

dataset_files() → Iterable[source]

Iterable of files included in the dataset.

download(out_dir: str) → None

Download a Dataset from a remote URL.

download_anomalies_file(raw_dir: str) → Iterable[DataFrame][source]

Method in charge of downloading and parsing the SMAP dataset labels files. This is because the tags are located at a different URL than the data.

Parameters

raw_dir (str) – Directory were the unparsed data of SMAP dataset is stored until it’s processed.

Returns

labeled_anomalies – Anomalous intervals in the SMAP dataset.

Return type

Pandas Dataframe

extract_data(response: object, out_dir: str) → None

Extracts the contents of a compressed file in zip format.

Parameters

response (Object) – Response object returned by the get method of the Requests library.

out_dir (str) – Directory were the zip file will be unzziped.

feed_pipeline(out_dir: str) → None[source]

Copy selected data file to pipeline step folder.

fetch(out_dir: str) → None

Fetch data to feed the pipeline.

Parameters

out_dir (str) – Directory where the dataset will be stored.

is_dataset_ready() → bool

Check if dataset has been already loaded/downloaded and parsed to package format.

parse_to_package(raw_dir: str) → None[source]

Parse raw dataset to package format. Data and labels must be saved in parquet format. More information about parquet format: https://parquet.apache.org/

Parameters

raw_dir – Directory where the dataset was downloaded from its source.

represent_anomalies(labels: Iterable[DataFrame], channel: str, labeled_anomalies: Iterable[DataFrame]) → Iterable[DataFrame][source]

Represent anomalies in the label’s dataset following the anomalous intervals of ‘labeled_anomalies.csv’

Parameters

labels (Pandas Dataframe) – Dataframe with the shape of the labels but filled wth zeros.

channel (str) – Channel identifier (subdataset)

labeled_anormalies (Pandas Dataframe) – Anomalous intervals in the SMAP dataset.

Returns

labels – Labels dataset correctly filled.

Return type

Pandas Dataframe

show_start_message() → None

Show starting step info message.

sklearn_predefined_split() → Iterable[int][source]

Generates array of indexes of same length as sequences to be used with ‘PredefinedSplit’ SMAP dataset has only 2 sequences: one for train and another for test.

Returns

List with the index for each sequence of the dataset.

Return type

array

Methods

`__init__`(channel)	Main constructor of the class.
`check_checksum`(response)	Checks if the checksum of the downloaded file corresponds to the one provided in the class.
`check_checksum_lbl`(response, checksum)	Checks if the checksum of the downloaded file corresponds to the one provided in the class.
`dataset_files`()	Iterable of files included in the dataset.
`download`(out_dir)	Download a Dataset from a remote URL.
`download_anomalies_file`(raw_dir)	Method in charge of downloading and parsing the SMAP dataset labels files.
`extract_data`(response, out_dir)	Extracts the contents of a compressed file in zip format.
`feed_pipeline`(out_dir)	Copy selected data file to pipeline step folder.
`fetch`(out_dir)	Fetch data to feed the pipeline.
`is_dataset_ready`()	Check if dataset has been already loaded/downloaded and parsed to package format.
`parse_to_package`(raw_dir)	Parse raw dataset to package format.
`represent_anomalies`(labels, channel, ...)	Represent anomalies in the label's dataset following the anomalous intervals of 'labeled_anomalies.csv'
`show_start_message`()	Show starting step info message.
`sklearn_predefined_split`()	Generates array of indexes of same length as sequences to be used with 'PredefinedSplit' SMAP dataset has only 2 sequences: one for train and another for test.

Attributes

`CHANNELS`
`CHECKSUM`
`CHECKSUM_FORMAT`
`FILE_FORMAT`
`LABEL`
`SEQUENCE_COLUMN`
`TIME_COLUMN`
`URL`
`VARIABLES`