conmo.preprocesses.CustomPreprocess

class conmo.preprocesses.CustomPreprocess(fn: Callable[[DataFrame, DataFrame], Tuple[DataFrame, DataFrame]])[source]

Core class used to implement self-created preprocess. Such preprocess will be wrapped in a function that will be passed as an argument to the constructor of this class.

__init__(fn: Callable[[DataFrame, DataFrame], Tuple[DataFrame, DataFrame]]) None[source]
apply(in_dir: str, out_dir: str) None[source]

Applies the custom preprocess to labels and data.

Parameters
  • in_dir (str) – Input directory where the files are located. Usually, this is the output directory of the splitter step.

  • out_dir (str) – Output directory where the files will be saved.

load_input(in_dir: str) -> (<class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>)

Read parquet data and labels files of the chosen dataset before it’s split.

Parameters

in_dir (str) – Input directory where the files are located.

Returns

  • data (Pandas Dataframe) – Loaded data file.

  • labels (Pandas Dataframe) – Loaded labels file.

save_output(out_dir: str, data: DataFrame, labels: DataFrame) None

Save preprocessed dataset to parquet format.

Parameters
  • out_dir (str) – Output directory where the results will be saved.

  • data (Pandas Dataframe) – Preprocessed data.

  • labels (Pandas Dataframe) – Preprocessed labels.

show_start_message() None

Simple method to print on the terminal the name of the selected splitter.

Methods

__init__(fn)

apply(in_dir, out_dir)

Applies the custom preprocess to labels and data.

load_input(in_dir)

Read parquet data and labels files of the chosen dataset before it's split.

save_output(out_dir, data, labels)

Save preprocessed dataset to parquet format.

show_start_message()

Simple method to print on the terminal the name of the selected splitter.