Skip to content

Documentation / software bug for building custom database #11

Description

@taltman

Hi LMAT team!

I have been trying to follow the documentation in lmat-doc.txt to build a custom database for use with LMAT. I've been having issues doing so. I'll try to document a specific case here.

One step in the process of building a custom database is constructing a mapping file between NCBI Taxonomy Database identifiers and the full deflines from the multi-FASTA formatted file containing the reference sequences. I've followed the documentation below in that regard:

 The mapping is specified as a tab delimited file with the first column containing the tax id and the second
 column should contain the header associated with sequence stored in the input fasta file (WORK/test.fa below)
 For example:
 418127   >ref|NC_009782.1|gnl|NCBI_GENOMES|21340|gi|156978331|Staphylococcus aureus subsp. aureus Mu3, complete genome

When I provide my constructed GenomeToTaxID.txt file to build_header_table.py, it breaks:

reading: /media/ephemeral/taltman/lmat/GenomeToTaxID.txt
Traceback (most recent call last):
  File "./build_header_table.py", line 44, in <module>
    gi_to_tid[t[4]] = t[0]
IndexError: list index out of range

Poking into the Python script, it seems to be expecting a file with at least five columns, not two. Changing t[4] to t[1] seems to fix it.

So, either there is a documentation bug, or there is a software bug.

Any feedback would be greatly appreciated. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions