conmo.splitters.splitter.Splitter

class conmo.splitters.splitter.Splitter[source]
__init__()
already_splitted(df: DataFrame) bool[source]

Checks if the dataset was already splitted.

Parameters

df (Pandas Dataframe) – Input dataset.

Returns

True in case the dataset was already splitted, False otherwise.

Return type

bool

Raises

RuntimeError – If the dataset isn’t splitted and doesn’t follow Conmo’s format.

load_input(in_dir: str) -> (<class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>)[source]

Read parquet data and labels files of the chosen dataset.

Parameters

in_dir (str) – Input directory where the files are located.

Returns

  • data (Pandas Dataframe) – Loaded data file.

  • labels (Pandas Dataframe) – Loaded labels file.

Raises

If data and labels have different sequences values.

save_output(out_dir: str, data: DataFrame, labels: DataFrame) None[source]

Save splitted dataset to parquet format.

Parameters
  • out_dir (str) – Output directory where the results will be saved.

  • data (Pandas Dataframe) – Splitted data.

  • labels (Pandas Dataframe) – Splitted labels.

show_start_message()[source]

Simple method to print on the terminal the name of the selected splitter.

abstract split(in_dir: str, out_dir: str) None[source]

Performs the split to both data and labels of the dataset.

Parameters
  • in_dir (str) – Input directory of the before step.

  • out_dir (str) – Output directory where te split data will be stored.

Methods

__init__()

already_splitted(df)

Checks if the dataset was already splitted.

load_input(in_dir)

Read parquet data and labels files of the chosen dataset.

save_output(out_dir, data, labels)

Save splitted dataset to parquet format.

show_start_message()

Simple method to print on the terminal the name of the selected splitter.

split(in_dir, out_dir)

Performs the split to both data and labels of the dataset.