-
Notifications
You must be signed in to change notification settings - Fork 2
Model building: visualisation & modelling
Phong Nguyen edited this page Dec 4, 2018
·
10 revisions
This is to summarise the steps to train the model, as well as the interaction and data transfer between the visualisation and modelling.
- Create new classes.
- Assign classes to a small set of threads.
- Similar threads can be found using Thread Features and Thread Projection views such as a strong cluster of threads.
- Other views can be used to gain deeper understanding in order to verify and find appropriate labels for classes.
- Send labelled data to the Modelling part to train a new model
- End point of API:
127.0.0.1:5000/model?data=&rec= - Data format:
datais (a String representation of) an array of threads with labels[ { threadId, classId } ]andrecis a flag (true/false) to receive recommended samples or not.
- End point of API:
- Receive labelled data and train a new model.
- Then send back to the Visualisation classes for the entire dataset (and any additional modelling information such as confidence?), and recommended samples if asked.
- Data format: a String representation of the following
jsonso thatJSON.parse()will return ajsonobject.
- Data format: a String representation of the following
{
classLookup: { threadId: classId }
samples: [threadId]
}
- Display classes of the entire dataset together with any additional modelling information
- Display recommended samples if any
- Relabel the recommended samples or user-chosen threads
- Send updated labelled data (only the ones get changed?) to the Modelling part to update the model
- Data format: the same as in Phase 1.1.
- Is it possible to create new classes at this stage?
- Use the updated labels to update the model
- The same as in phase 1, return classes for the entire dataset (actually it's better to return only those changed?) and new recommended samples if asked
- Data format: the same as in Phase 1.2.
This is exactly the same as in Phase 1.3.
- How do the Visualisation and the Modelling know which data file to work on?
- Both use hard coded data file, say always use the 1000 threads file.
- Initially
- No model is loaded by default (neither visualisation or modelling)
- Need to click on New to create a new model (or new project, this has nothing to do with the active learning model yet). The user will be prompted to enter a name. This name will be used to save the model, both visualisation and modelling.
- When to save?
- A button Save to save on the Visualisation side.
- A request
127.0.0.1:5000/save?name=will be sent to the Modelling with the entered name to ask the Modelling to save the model using the same name.
- The Visualisation saves:
[{ classId, classLabel }]-
{ threadId: classId }for the entire dataset -
[threadId]of recommended samples
- The Modelling saves:
- the serialisation of the model
- When to load?
- A button Load to load on the Visualisation side.
- A request
127.0.0.1:5000/load?name=will be sent to the Modelling to ask the Modelling to load the corresponding model.