Skip to content

helix-ml/openml_workflows

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

OpenML Scikit-Learn Workflows

The data for 475,297 machine learning runs provided in the .csv file were directly derived from https://www.openml.org/.

In relation to our HILDA paper, a run corresponds to a single row in the .csv file, and a sequence is a groupby on user_id and task_id.

The columns in the .csv are as follows:

  • rid: Run id on OpenML
  • user_id: User id on OpenML
  • task_id: Task id on OpenML
  • auc: Area under ROC curve of the run
  • dist_from_mean_auc: Relative performance--difference between the AUC of the run and the mean AUC of all the runs for the same task_id
  • model: Set of Scikit-Learn classifiers/estimators and model wrappers used in the run
  • model_params: Set of model hyperparameters represented as (parameter name, parameter value) tuples
  • ppr: Set of preprocessing operators (note that "set()" means that no preprocessing was used)
  • ppr_params: Set of preprocessing hyperparameters represented as (parameter name, parameter value) tuples
  • iter: Iteration of the run in the sequence (starting from 1)
  • change_type: Type of change from the previous iteration:
    • 'S': Starting iteration
    • 'M': Model operator change
    • 'P': Preprocessing operator change
    • 'H': Model Hyperparameter change
    • 'R': Preprocessing hyperparameter change
    • 'C': Combination of model and preprocessing changes (operator or hyperparameter)
    • 'N': No change
  • delta_auc: Change in AUC from the previous iteration
  • start_time: Start time of the run
  • time_delta_in_mins: Difference between the start time of the current iteration with the previous iteration

About

Execution traces and metadata for workflows from OpenML using Scikit-Learn

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors