conmo.datasets.SoilMoistureActivePassiveSatellite
- class conmo.datasets.SoilMoistureActivePassiveSatellite(channel: str)[source]
- __init__(channel: str) None [source]
Main constructor of the class.
- Parameters
name (str) – The name given to the dataset.
- check_checksum(response: object) bool
Checks if the checksum of the downloaded file corresponds to the one provided in the class. For security e integrity issues. Currently only the md5 algorithm is integrated.
- Parameters
response (Object) – Response object returned by the get method of the Requests library.
- Return type
Boolean variable indicating whether the comparison of the hash with the checksum was successful or not.
- check_checksum_lbl(response: object, checksum: str) bool [source]
Checks if the checksum of the downloaded file corresponds to the one provided in the class. For security e integrity issues. Currently only the md5 algorithm is integrated. Since in the SMAP dataset the labels are obtained from a different file, it’s necessary to use another method to pass the checksum of that file.
- Parameters
response (object) – Response object returned by the get method of the Requests library.
checksum (str) – String containing the labels’ checksum.
- Returns
Boolean variable indicating whether the comparison of the hash with the checksum was successful or not.
- Return type
bool
- download(out_dir: str) None
Download a Dataset from a remote URL.
- download_anomalies_file(raw_dir: str) Iterable[DataFrame] [source]
Method in charge of downloading and parsing the SMAP dataset labels files. This is because the tags are located at a different URL than the data.
- Parameters
raw_dir (str) – Directory were the unparsed data of SMAP dataset is stored until it’s processed.
- Returns
labeled_anomalies – Anomalous intervals in the SMAP dataset.
- Return type
Pandas Dataframe
- extract_data(response: object, out_dir: str) None
Extracts the contents of a compressed file in zip format.
- Parameters
response (Object) – Response object returned by the get method of the Requests library.
out_dir (str) – Directory were the zip file will be unzziped.
- fetch(out_dir: str) None
Fetch data to feed the pipeline.
- Parameters
out_dir (str) – Directory where the dataset will be stored.
- is_dataset_ready() bool
Check if dataset has been already loaded/downloaded and parsed to package format.
- parse_to_package(raw_dir: str) None [source]
Parse raw dataset to package format. Data and labels must be saved in parquet format. More information about parquet format: https://parquet.apache.org/
- Parameters
raw_dir – Directory where the dataset was downloaded from its source.
- represent_anomalies(labels: Iterable[DataFrame], channel: str, labeled_anomalies: Iterable[DataFrame]) Iterable[DataFrame] [source]
Represent anomalies in the label’s dataset following the anomalous intervals of ‘labeled_anomalies.csv’
- Parameters
labels (Pandas Dataframe) – Dataframe with the shape of the labels but filled wth zeros.
channel (str) – Channel identifier (subdataset)
labeled_anormalies (Pandas Dataframe) – Anomalous intervals in the SMAP dataset.
- Returns
labels – Labels dataset correctly filled.
- Return type
Pandas Dataframe
- show_start_message() None
Show starting step info message.
Methods
__init__
(channel)Main constructor of the class.
check_checksum
(response)Checks if the checksum of the downloaded file corresponds to the one provided in the class.
check_checksum_lbl
(response, checksum)Checks if the checksum of the downloaded file corresponds to the one provided in the class.
Iterable of files included in the dataset.
download
(out_dir)Download a Dataset from a remote URL.
download_anomalies_file
(raw_dir)Method in charge of downloading and parsing the SMAP dataset labels files.
extract_data
(response, out_dir)Extracts the contents of a compressed file in zip format.
feed_pipeline
(out_dir)Copy selected data file to pipeline step folder.
fetch
(out_dir)Fetch data to feed the pipeline.
Check if dataset has been already loaded/downloaded and parsed to package format.
parse_to_package
(raw_dir)Parse raw dataset to package format.
represent_anomalies
(labels, channel, ...)Represent anomalies in the label's dataset following the anomalous intervals of 'labeled_anomalies.csv'
Show starting step info message.
Generates array of indexes of same length as sequences to be used with 'PredefinedSplit' SMAP dataset has only 2 sequences: one for train and another for test.
Attributes
CHANNELS
CHECKSUM
CHECKSUM_FORMAT
FILE_FORMAT
LABEL
SEQUENCE_COLUMN
TIME_COLUMN
URL
VARIABLES