Skip to content

Phamerator workflow changes #13

Description

@chg60

I realize you probably don't maintain this code base much anymore... however, if you do...

I am a graduate student in Graham Hatfull’s lab at the University of Pittsburgh. Because PhamDB makes use of the Phamerator database scheme and parts of our workflow, I wanted to let you know about some changes we’re in the process of making to the Phamerator workflow.

The most important change is that we are ceasing use of kClust and HHsuite in favor of the newer, faster (still Söding-lab produced) MMseqs2. Our testing suggests that this program produces better phams than the iterative kClust approach, in a fraction of the time.

We will also be adding at least one column to the pham table. This column would contain the pham’s conservation score, as assessed by generating a Clustal Omega alignment and dividing the number of perfectly conserved residues by the length of the shortest gene in the pham (this means of calculating prevents draft-status genes, which are frequently called shorter than their manually annotated peers, from artificially lowering the conservation score of the pham). Of course the inclusion of this column means we will also be generating Clustal Omega alignments for each pham moving forward.

I don’t think it’s worth your time to worry about including the Clustal Omega portion of our modified workflow in the PhamDB workflow. However, given that MMseqs2 and kClust work quite differently and result in the production of quite different overall databases, it may be worth it for you or somebody else to update PhamDB to use MMseqs2 instead of kClust. This would likely result in less confusion downstream for Phamerator users trying to build their own databases and wondering why they’re not compatible with https://www.phamerator.org/ or why they can’t get their database clustered as well as we now can.

If you’d like more information about any of these changes or how they may impact you or your users, feel free to reach out.

-Christian Gauthier (christian.gauthier@pitt.edu)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions