Skip to content

Prebuilt databases do not have nodes.dmp for the use of classifiedRefiner module; segfault if using gtdb_r226 instead of gtdb_r220+virus+human database #175

Description

@TomKLHui

(1) The prebuilt database worked well for the classify workflow but when I proceed to remove unclassified/ human portions it failed. should i just copy NCBI taxdump files here?
(2) Is it normal to have ~95% read as unclassified? What should I expect?

'''
classifiedRefiner 07a.metabuli/JL304_B27_2_classifications.tsv ../metabuli/gtdb+virus+human --threads 4 --remove-unclassified --report 1

Metabuli Version (commit): 1.1.1
Remove unclassified reads true
Exclude taxId as well as its children
Select taxId as well as its children
Select columns with number, (7:full lineage, generated if absent)
Make report of refined classification file true
Adjust classification to the specified rank
0: without higher rank, 1: with higher rank, 2: separate file for higher rank classification 0
Threads 4
Min. sequence similarity score 0

Loading nodes file ...File ../metabuli/gtdb+virus+human/nodes.dmp not found!
'''

(3) It also happened that the classify workflow only worked for trimmed reads using prebuilt gtdb_r220+virus+human but not for the prebuilt gtdb_r226. Any other use cases cause segfault. I wonder if it is caused by large data size limited RAM (remote server max: 500 Gb; input file size: ~5Gb , paired)

Tom

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions