Skip to content

vinkami/stat3612-project

Repository files navigation

STAT3612 Group 23

Instructions to user

  • The recommended Python version is 3.10.12.
  • Please ensure the packages are installed before running the code.
    This can be done by installing with the requirements.txt file provided.
    pip install -r requirements.txt
  • Python version and package versions do not have to be exactly the same, but we highly recommend using the same versions in case results slightly differ.
  • The results on kaggle and in the report is generated using HKU CS GPU Farm for Teaching. Please refer to the report section 3.1 for more details.
  • There is a gp_utils.py file that contains functions for functions that share between the two track notebooks. Please make sure to put it in the same directory as the notebooks.
  • Please put the data files in a data/ folder in the same directory as the notebooks.
    • The notebooks will search for:
    • data/ehr_preprocessed_seq_by_cat_embedding.pkl
    • data/notes.csv
    • data/train.csv
    • data/valid.csv
    • data/test.csv
    • Since image files are not used for training for the track 2 submission, there is no need to include them to replicate the results shown on kaggle.
  • The notebooks are:
    • Group23_Track1.ipynb : for Track 1
    • Group23_Track2.ipynb : for Track 2
      • In case any modification happens, the configurations for the notebook should be USE_IMAGE = False and USE_TEXT = True for the result in the kaggle leaderboard.
        This should be the default configuration as well.
  • The results of the notebooks will be saved to output/ folder in the same directory as the notebooks, with names track{1|2}_{timestamp}.csv.

About

Hospital readmission prediction based on Electronic Health Record data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors