Skip to content

Model building: visualisation & modelling

Phong Nguyen edited this page Dec 4, 2018 · 10 revisions

This is to summarise the steps to train the model, as well as the interaction and data transfer between the visualisation and modelling.

1. Create an initial model

1.1 Visualisation

  • Create new classes.
  • Assign classes to a small set of threads.
    • Similar threads can be found using Thread Features and Thread Projection views such as a strong cluster of threads.
    • Other views can be used to gain deeper understanding in order to verify and find appropriate labels for classes.
  • Send labelled data to the Modelling part to train a new model
    • End point of API: 127.0.0.1:5000/model?data=&rec=
    • Data format: data is (a String representation of) an array of threads with labels [ { threadId, classId } ] and rec is a flag (true/false) to receive recommended samples or not.

1.2 Modelling

  • Receive labelled data and train a new model.
  • Then send back to the Visualisation classes for the entire dataset (and any additional modelling information such as confidence?), and recommended samples if asked.
    • Data format: a String representation of the following json so that JSON.parse() will return a json object.
{ 
  classLookup: { threadId: classId }
  samples: [threadId]
}

1.3 Visualisation

  • Display classes of the entire dataset together with any additional modelling information
  • Display recommended samples if any

2. Update model with new user-assigned labels

2.1 Visualisation

  • Relabel the recommended samples or user-chosen threads
  • Send updated labelled data (only the ones get changed?) to the Modelling part to update the model
    • Data format: the same as in Phase 1.1.
  • Is it possible to create new classes at this stage?

2.2 Modelling

  • Use the updated labels to update the model
  • The same as in phase 1, return classes for the entire dataset (actually it's better to return only those changed?) and new recommended samples if asked
    • Data format: the same as in Phase 1.2.

2.3 Visualisation

This is exactly the same as in Phase 1.3.

3. Model new/save/load workflow

  • How do the Visualisation and the Modelling know which data file to work on?
    • Both use hard coded data file, say always use the 1000 threads file.
  • Initially
    • No model is loaded by default (neither visualisation or modelling)
    • Need to click on New to create a new model (or new project, this has nothing to do with the active learning model yet). The user will be prompted to enter a name. This name will be used to save the model, both visualisation and modelling.
  • When to save?
    • A button Save to save on the Visualisation side.
    • A request 127.0.0.1:5000/save?name= will be sent to the Modelling with the entered name to ask the Modelling to save the model using the same name.
  • The Visualisation saves:
    • [{ classId, classLabel }]
    • { threadId: classId } for the entire dataset
    • [threadId] of recommended samples
  • The Modelling saves:
    • the serialisation of the model
  • When to load?
    • A button Load to load on the Visualisation side.
    • A request 127.0.0.1:5000/load?name= will be sent to the Modelling to ask the Modelling to load the corresponding model.

Clone this wiki locally