Skip to content

Fixes to improve pipeline preparation script#474

Open
ymahlich wants to merge 4 commits into
mainfrom
fixes-ym
Open

Fixes to improve pipeline preparation script#474
ymahlich wants to merge 4 commits into
mainfrom
fixes-ym

Conversation

@ymahlich
Copy link
Copy Markdown
Collaborator

PR contains several fixes to:

prepare_data_for_improve.py:

  • casting study name to lower case for consitency
  • "type casting" the mRecist category conversion to str to guarantee compatibility with other value types

dataset.yml

  • explicit conversions of citations field to String to prevent faulty parsing using the pythons yaml parser

fixed indentation of docstring in `dataset.Dataset.split_train_other(...)`
added qutations to all dataset citations since ":" were interfering with proper parsing in select cases
conversion of mRECIST values needed for some PDX datasets. The formatting function of coderdata returns a DF with the mRECIST column having dtype 'str'. Previously 'floats' where assigned to the values causing a TypeError.

Note the datasets still have a typo of mRESCIST.
for the improve framework to work properly study identifiers in the split file names and the respones have to be of the same case. This fix is implemented such that this is guaranteed
@ymahlich ymahlich requested a review from sgosline May 26, 2026 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant