Updates for issue 90#105
Conversation
Change filtering criteria: DO NOT filter out pathogenic germline variants. TERT always to report. The genes with one of these 2 conditions will be reported with highest priority in PRONTO report. These "rescued" variants should have a separate column (at the far right of the table) to verify this. Separate column "Filter rescued", values are "Yes" or empty.
… should not appear in the tables appearing on the right of slides in report, but only printed this column in the summary table in slide 8.
|
@marrip I just got further request for this issue. I updated in the issue 90. There will be some further new codes coming soon. |
ok, then I will wait with the review until you tell me to start |
… rescued variants which are not include in Filter0-3.(Last table in the report)
…to develop_issue90
|
will start latest tomorrow 🙂 |
marrip
left a comment
There was a problem hiding this comment.
hey Xiaoli! I had a couple of questions and a suggestion. I am also working on a refactoring of some of the parts but need your input first
| output_table_file_config = output_file_preMTB_table_path + "_" + output_table + ".txt" | ||
| if(',' in filter_column): | ||
| for column in filter_column.split(','): | ||
| all_data = read_tsv(data_file_small_variant_table,column,key_word) |
There was a problem hiding this comment.
here it looks like you are always overwriting all_data by using the last item in filter_column in read_tsv. Is that desired behavior?
There was a problem hiding this comment.
Emm, it is from the filter conditions in the Filter sections in configure file. There are multiple filters for each of them.
There was a problem hiding this comment.
I understand, but it seems you are looping through those and all_data will always be filtered according to the last item in that list - you don't seem to be saving the others or am I missing something?
There was a problem hiding this comment.
In case of this:
[FILTER0-1]
;pecify the column name need to be filtered:
filter_column = CPSR_ACMG_class,CPSR_ClinVar_classyou will first apply CPSR_ACMG_class and then CPSR_ClinVar_class after but only the results from CPSR_ClinVar_class are saved in all_data. The filtering with CPSR_ACMG_class seems to not be considered.
There was a problem hiding this comment.
Thanks Martin! I will take a further check. I currently have trouble to login to the development server, need to resolve the issue first.
| if(filter_section == "0"): | ||
| all_data_filter = [] | ||
| top_filter = int(cfg.get("INPUT", "top_filter")) + 1 | ||
| for top_filter_num in range(1,top_filter): |
There was a problem hiding this comment.
I would like to refactor this section to make it easier to read
| clear_blank_line(output_table_file_config_pre,output_table_file_config) | ||
| all_data_filter.append(all_data) | ||
|
|
||
| all_data_filter = sum(all_data_filter, []) |
| if(len(all_data_filter[i]) < header_length): | ||
| count = header_length - len(all_data_filter[i]) | ||
| all_data_filter[i] = [[item.replace('\n', '') for item in cell] for cell in all_data_filter[i]] | ||
| all_data_filter[i].pop() | ||
| for j in range(1, count): | ||
| all_data_filter[i].append(' \t') | ||
| all_data_filter[i].append('\n') |
There was a problem hiding this comment.
could you explain to me what this section does? It replaces any \n, removes the last item and places empty fields in the table and finishes off with \n. Why is this necessary?
|
Looking at the remaining changes it seems that a lot of the fixes are to handle different column numbers of the combined tables, replacing tabs with linebreaks or vice versa and making data unique. I would suggest we rework this and use pandas instead which would make reading, filtering, combining and writing to file a lot easier. What do you think? |
This is removed in main branch, not sure why it is existing here. Co-authored-by: Martin Rippin <74295098+marrip@users.noreply.github.com>
…to develop_issue90
Change filtering criteria:
The gene with one of these 2 conditions will be reported with highest priority in PRONTO report.
These "rescued" variants should have a separate column in tables to verify this. Separate column "Filter rescued", values are "Yes" or empty.