conmo.splitters.SklearnSplitter
- class conmo.splitters.SklearnSplitter(splitter: Union[GroupKFold, GroupShuffleSplit, KFold, LeaveOneGroupOut, LeavePGroupsOut, LeaveOneOut, LeavePOut, PredefinedSplit, RepeatedKFold, RepeatedStratifiedKFold, ShuffleSplit, StratifiedKFold, StratifiedShuffleSplit, TimeSeriesSplit], groups: Optional[Iterable[int]] = None)[source]
- __init__(splitter: Union[GroupKFold, GroupShuffleSplit, KFold, LeaveOneGroupOut, LeavePGroupsOut, LeaveOneOut, LeavePOut, PredefinedSplit, RepeatedKFold, RepeatedStratifiedKFold, ShuffleSplit, StratifiedKFold, StratifiedShuffleSplit, TimeSeriesSplit], groups: Optional[Iterable[int]] = None) None [source]
- already_splitted(df: DataFrame) bool
Checks if the dataset was already splitted.
- Parameters
df (Pandas Dataframe) – Input dataset.
- Returns
True in case the dataset was already splitted, False otherwise.
- Return type
bool
- Raises
RuntimeError – If the dataset isn’t splitted and doesn’t follow Conmo’s format.
- extract_fold(df: ~pandas.core.frame.DataFrame, sequences: ~numpy.ndarray, fold: int, train_idx: ~numpy.ndarray, test_idx: ~numpy.ndarray) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)[source]
- load_input(in_dir: str) -> (<class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>)
Read parquet data and labels files of the chosen dataset.
- Parameters
in_dir (str) – Input directory where the files are located.
- Returns
data (Pandas Dataframe) – Loaded data file.
labels (Pandas Dataframe) – Loaded labels file.
- Raises
If data and labels have different sequences values. –
- save_output(out_dir: str, data: DataFrame, labels: DataFrame) None
Save splitted dataset to parquet format.
- Parameters
out_dir (str) – Output directory where the results will be saved.
data (Pandas Dataframe) – Splitted data.
labels (Pandas Dataframe) – Splitted labels.
- show_start_message()
Simple method to print on the terminal the name of the selected splitter.
Methods
__init__
(splitter[, groups])already_splitted
(df)Checks if the dataset was already splitted.
extract_fold
(df, sequences, fold, train_idx, ...)load_input
(in_dir)Read parquet data and labels files of the chosen dataset.
save_output
(out_dir, data, labels)Save splitted dataset to parquet format.
Simple method to print on the terminal the name of the selected splitter.
split
(in_dir, out_dir)Performs the split to both data and labels of the dataset.
to_dataframe
(df, data, index)