Skip to content

Updates for issue 90#105

Open
xiaoliz0 wants to merge 11 commits into
mainfrom
develop_issue90
Open

Updates for issue 90#105
xiaoliz0 wants to merge 11 commits into
mainfrom
develop_issue90

Conversation

@xiaoliz0
Copy link
Copy Markdown
Contributor

Change filtering criteria:

  1. DO NOT filter out pathogenic germline variants.
  2. TERT always to report.

The gene with one of these 2 conditions will be reported with highest priority in PRONTO report.
These "rescued" variants should have a separate column in tables to verify this. Separate column "Filter rescued", values are "Yes" or empty.

Change filtering criteria: DO NOT filter out pathogenic germline variants. TERT always to report.
The genes with one of these 2 conditions will be reported with highest priority in PRONTO report.
These "rescued" variants should have a separate column (at the far right of the table) to verify this. Separate column "Filter rescued", values are "Yes" or empty.
@xiaoliz0 xiaoliz0 linked an issue Apr 22, 2026 that may be closed by this pull request
… should not appear in the tables appearing on the right of slides in report, but only printed this column in the summary table in slide 8.
@xiaoliz0 xiaoliz0 requested review from marrip and tonjegul April 30, 2026 11:09
@xiaoliz0
Copy link
Copy Markdown
Contributor Author

@marrip I just got further request for this issue. I updated in the issue 90. There will be some further new codes coming soon.

@marrip
Copy link
Copy Markdown
Collaborator

marrip commented Apr 30, 2026

@marrip I just got further request for this issue. I updated in the issue 90. There will be some further new codes coming soon.

ok, then I will wait with the review until you tell me to start ☺️

@xiaoliz0
Copy link
Copy Markdown
Contributor Author

xiaoliz0 commented May 4, 2026

@marrip I just got further request for this issue. I updated in the issue 90. There will be some further new codes coming soon.

ok, then I will wait with the review until you tell me to start ☺️

The new commits implement the further request. Feel free to review the codes. @marrip :)

@marrip
Copy link
Copy Markdown
Collaborator

marrip commented May 4, 2026

will start latest tomorrow 🙂

Copy link
Copy Markdown
Collaborator

@marrip marrip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey Xiaoli! I had a couple of questions and a suggestion. I am also working on a refactoring of some of the parts but need your input first ☺️ Will continue tomorrow.

Comment thread Script/PRONTO.py Outdated
Comment thread Script/PRONTO.py
output_table_file_config = output_file_preMTB_table_path + "_" + output_table + ".txt"
if(',' in filter_column):
for column in filter_column.split(','):
all_data = read_tsv(data_file_small_variant_table,column,key_word)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here it looks like you are always overwriting all_data by using the last item in filter_column in read_tsv. Is that desired behavior?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emm, it is from the filter conditions in the Filter sections in configure file. There are multiple filters for each of them.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but it seems you are looping through those and all_data will always be filtered according to the last item in that list - you don't seem to be saving the others or am I missing something?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of this:

[FILTER0-1]
;pecify the column name need to be filtered:
filter_column = CPSR_ACMG_class,CPSR_ClinVar_class

you will first apply CPSR_ACMG_class and then CPSR_ClinVar_class after but only the results from CPSR_ClinVar_class are saved in all_data. The filtering with CPSR_ACMG_class seems to not be considered.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Martin! I will take a further check. I currently have trouble to login to the development server, need to resolve the issue first.

Comment thread Script/PRONTO.py
if(filter_section == "0"):
all_data_filter = []
top_filter = int(cfg.get("INPUT", "top_filter")) + 1
for top_filter_num in range(1,top_filter):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to refactor this section to make it easier to read ☺️

Comment thread Script/PRONTO.py
clear_blank_line(output_table_file_config_pre,output_table_file_config)
all_data_filter.append(all_data)

all_data_filter = sum(all_data_filter, [])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do?

Comment thread Script/PRONTO.py
Comment on lines +1412 to +1418
if(len(all_data_filter[i]) < header_length):
count = header_length - len(all_data_filter[i])
all_data_filter[i] = [[item.replace('\n', '') for item in cell] for cell in all_data_filter[i]]
all_data_filter[i].pop()
for j in range(1, count):
all_data_filter[i].append(' \t')
all_data_filter[i].append('\n')
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain to me what this section does? It replaces any \n, removes the last item and places empty fields in the table and finishes off with \n. Why is this necessary?

@marrip
Copy link
Copy Markdown
Collaborator

marrip commented May 6, 2026

Looking at the remaining changes it seems that a lot of the fixes are to handle different column numbers of the combined tables, replacing tabs with linebreaks or vice versa and making data unique. I would suggest we rework this and use pandas instead which would make reading, filtering, combining and writing to file a lot easier. What do you think?

xiaoliz0 and others added 4 commits May 6, 2026 13:53
This is removed in main branch, not sure why it is existing here.

Co-authored-by: Martin Rippin <74295098+marrip@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

National request for data filter (big change)

2 participants