conmo.preprocesses.CustomPreprocess
- class conmo.preprocesses.CustomPreprocess(fn: Callable[[DataFrame, DataFrame], Tuple[DataFrame, DataFrame]])[source]
Core class used to implement self-created preprocess. Such preprocess will be wrapped in a function that will be passed as an argument to the constructor of this class.
- apply(in_dir: str, out_dir: str) None [source]
Applies the custom preprocess to labels and data.
- Parameters
in_dir (str) – Input directory where the files are located. Usually, this is the output directory of the splitter step.
out_dir (str) – Output directory where the files will be saved.
- load_input(in_dir: str) -> (<class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>)
Read parquet data and labels files of the chosen dataset before it’s split.
- Parameters
in_dir (str) – Input directory where the files are located.
- Returns
data (Pandas Dataframe) – Loaded data file.
labels (Pandas Dataframe) – Loaded labels file.
- save_output(out_dir: str, data: DataFrame, labels: DataFrame) None
Save preprocessed dataset to parquet format.
- Parameters
out_dir (str) – Output directory where the results will be saved.
data (Pandas Dataframe) – Preprocessed data.
labels (Pandas Dataframe) – Preprocessed labels.
- show_start_message() None
Simple method to print on the terminal the name of the selected splitter.
Methods
__init__
(fn)apply
(in_dir, out_dir)Applies the custom preprocess to labels and data.
load_input
(in_dir)Read parquet data and labels files of the chosen dataset before it's split.
save_output
(out_dir, data, labels)Save preprocessed dataset to parquet format.
Simple method to print on the terminal the name of the selected splitter.