conmo.algorithms.PretrainedRandomForest

class conmo.algorithms.PretrainedRandomForest(pretrained: bool, max_depth: Optional[int] = None, random_state: Optional[int] = None, n_estimators: Optional[int] = None, path: Optional[str] = None)[source]
__init__(pretrained: bool, max_depth: Optional[int] = None, random_state: Optional[int] = None, n_estimators: Optional[int] = None, path: Optional[str] = None) None[source]
execute(idx: int, in_dir: str, out_dir: str) str

Performs a complete execution of the algorithm, loading input data, performing a run through the folds and saving the results.

Parameters
  • idx (int) – Index of the algorithm in the Experiment. Userful in case you want to experiment with several algorithms.

  • in_dir (str) – Intermediate directory where the input data to the algorithm is stored.

  • out_dir (str) – Intermediate directory where the output data (predictios of the algorithm) will be stored.

Returns

Name of the output directory.

Return type

str

fit_predict(data_train: DataFrame, data_test: DataFrame, labels_train: DataFrame, labels_test: DataFrame) DataFrame[source]

Trains the model with train data and then performs predictions with the trained algorithm over the test data.

Parameters
  • data_train (Pandas Dataframe) – Train data.

  • data_test (Pandas Dataframe) – Test data.

  • labels_train (Pandas Dataframe) – Train labels.

  • labels_test (Pandas Dataframe) – Test labels.

Returns

Results of the predictions made on the test set.

Return type

Pandas Dataframe

labels_per_sequence(labels: DataFrame) bool

Use only with time series datasets. Checks if the labels file of the chosen dataset has an index format with sequences only or sequences and time. This method in future updates will be changed to a specific class for time series.

Parameters

labels (Pandas Dataframe) – Labels file of the dataset.

Returns

True if the labels contains 1 level of index with sequence or False if the labels file contains 2 leves with sequence and time.

Return type

bool

Raises

RuntimeError – If the number of index levels is invalid.

load_input(in_dir: str) -> (<class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>)

Read parquet data and labels files of the chosen dataset.

Parameters

in_dir (str) – Input directory where the files are located.

Returns

  • data (Pandas Dataframe) – Loaded data file.

  • labels (Pandas Dataframe) – Loaded labels file.

load_weights()[source]

Load pretrained model/weights for the algorithm’s path.

save_output(results: DataFrame, out_dir: str, idx: int) str

Save algorithms output to parquet format.

Parameters
  • results (Pandas Dataframe) – Dataframe with the results of the execution.

  • out_dir (str) – Output directory where the results will be saved.

  • idx (int) – Index of the algorithm in the Experiment. Userful in case you want to experiment with several algorithms.

show_start_message()

Simple method to print on the terminal the name of the algorithm to be executed.

Methods

__init__(pretrained[, max_depth, ...])

execute(idx, in_dir, out_dir)

Performs a complete execution of the algorithm, loading input data, performing a run through the folds and saving the results.

fit_predict(data_train, data_test, ...)

Trains the model with train data and then performs predictions with the trained algorithm over the test data.

labels_per_sequence(labels)

Use only with time series datasets.

load_input(in_dir)

Read parquet data and labels files of the chosen dataset.

load_weights()

Load pretrained model/weights for the algorithm's path.

save_output(results, out_dir, idx)

Save algorithms output to parquet format.

show_start_message()

Simple method to print on the terminal the name of the algorithm to be executed.