Skip to content

Train task classifyapp on same data as for training the embedding #21

@dinhi

Description

@dinhi

Hello together,
we want to train a Keras model with the train_task_classifyapp.py script to make a simple binary classification:

  1. class: Applications which perform a stencil operation
  2. class: Applications which do not perform a stencil operation

For this purpose we created a dataset based on your synthetic datasets

The dataset has the following directory structure so the python script can handle it:

.
├── ncc
│   ├── train
│   │   ├── classifyapp
│   │   │   ├── ir_train
│   │   │   │   ├── 1
│   │   │   │   ├── 2
│   │   │   ├── ir_val
│   │   │   │   ├── 1
│   │   │   │   ├── 2
│   │   │   ├── ir_test
│   │   │   │   ├── 1
│   │   │   │   ├── 2

Folder 2 is a mixture of applications from the Eigen- and GEMM-synthetic dataset, folder 1 has only applications from the Stencil-synthetic dataset.

My questions are the following:

  • Since the Eigen-, GEMM- and Stencil-synthetic dataset have been used for training the inst2vec embedding, will this affect the training for classifyapp Keras model (positive or negative way)?
  • What was your setup for training and how long did it take? In our current setup, each folder for class 1 and 2 has 80 randomly picked applications, batch size is 4, epochs is 20 and number of training samples per classis 20. We are running the training on a Nvidia 1080 Ti. Only with this setup we could train the network in an affordable time (45 minutes per epoch). We are aware that this can yield in bad accuracy.
    In another setup, we had 2000 sample applications for each class in each set (train, val and test). The batch size was 4, number of training samples per class were 30 and 20 epochs. With this setup, the training time for each epoch went up to 55 hours (Keras ETA)! Using larger batch sizes leads to an error within Cuda since it can not allocate enough memory.
    Do you have any experience with these parameters for training? What could be the reason for such a high training time? In your script, you are using 64 batch size and 1500 training samples for class. Did it also take so much time for training 104 classes?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions