conmo.preprocesses.SavitzkyGolayFilter

class conmo.preprocesses.SavitzkyGolayFilter(to_data: Union[bool, Iterable[str]], to_labels: Union[bool, Iterable[str]], test_set: bool, window_length: int, polyorder: int, deriv: Optional[int] = 0, delta: Optional[float] = 1.0, mode: Optional[str] = 'interp', cval: Optional[float] = 0.0)[source]
__init__(to_data: Union[bool, Iterable[str]], to_labels: Union[bool, Iterable[str]], test_set: bool, window_length: int, polyorder: int, deriv: Optional[int] = 0, delta: Optional[float] = 1.0, mode: Optional[str] = 'interp', cval: Optional[float] = 0.0)[source]
apply(in_dir: str, out_dir: str) None

Applies the preprocess to the given dataset.

Parameters
  • in_dir (str) – Input directory where the files are located. Usually, this is the output directory of the splitter step.

  • out_dir (str) – Output directory where the files will be saved.

extract_columns(df: DataFrame, columns: Union[bool, Iterable[str]]) Iterable[str]

Returns a list containig all the column’s name of the data.

Parameters
  • df (Pandas Dataframe) – Dataframe containing the data.

  • columns (Union[bool, Iterable[str]]) – Bool value if the dataframe has columns or the list of columns.

Returns

columns – List containing the names of the dataframe’s columns.

Return type

Iterable[str]

load_input(in_dir: str) -> (<class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>)

Read parquet data and labels files of the chosen dataset before it’s split.

Parameters

in_dir (str) – Input directory where the files are located.

Returns

  • data (Pandas Dataframe) – Loaded data file.

  • labels (Pandas Dataframe) – Loaded labels file.

save_output(out_dir: str, data: DataFrame, labels: DataFrame) None

Save preprocessed dataset to parquet format.

Parameters
  • out_dir (str) – Output directory where the results will be saved.

  • data (Pandas Dataframe) – Preprocessed data.

  • labels (Pandas Dataframe) – Preprocessed labels.

show_start_message() None

Simple method to print on the terminal the name of the selected splitter.

transform(df: DataFrame, columns: Iterable[str]) DataFrame[source]

Performs the preprocess over the dataframe with the given columns.

Parameters
  • df (Pandas Dataframe) – Dataframe containing the data or the labels of the dataset.

  • columns (Iterable[str]) – List of columns that will be used in the preprocess. Also the columns of the final dataframe.

Returns

Dataframe preprocessed.

Return type

Pandas Dataframe

Methods

__init__(to_data, to_labels, test_set, ...)

apply(in_dir, out_dir)

Applies the preprocess to the given dataset.

extract_columns(df, columns)

Returns a list containig all the column's name of the data.

load_input(in_dir)

Read parquet data and labels files of the chosen dataset before it's split.

save_output(out_dir, data, labels)

Save preprocessed dataset to parquet format.

show_start_message()

Simple method to print on the terminal the name of the selected splitter.

transform(df, columns)

Performs the preprocess over the dataframe with the given columns.