I realize you probably don't maintain this code base much anymore... however, if you do...
I am a graduate student in Graham Hatfull’s lab at the University of Pittsburgh. Because PhamDB makes use of the Phamerator database scheme and parts of our workflow, I wanted to let you know about some changes we’re in the process of making to the Phamerator workflow.
The most important change is that we are ceasing use of kClust and HHsuite in favor of the newer, faster (still Söding-lab produced) MMseqs2. Our testing suggests that this program produces better phams than the iterative kClust approach, in a fraction of the time.
We will also be adding at least one column to the pham table. This column would contain the pham’s conservation score, as assessed by generating a Clustal Omega alignment and dividing the number of perfectly conserved residues by the length of the shortest gene in the pham (this means of calculating prevents draft-status genes, which are frequently called shorter than their manually annotated peers, from artificially lowering the conservation score of the pham). Of course the inclusion of this column means we will also be generating Clustal Omega alignments for each pham moving forward.
I don’t think it’s worth your time to worry about including the Clustal Omega portion of our modified workflow in the PhamDB workflow. However, given that MMseqs2 and kClust work quite differently and result in the production of quite different overall databases, it may be worth it for you or somebody else to update PhamDB to use MMseqs2 instead of kClust. This would likely result in less confusion downstream for Phamerator users trying to build their own databases and wondering why they’re not compatible with https://www.phamerator.org/ or why they can’t get their database clustered as well as we now can.
If you’d like more information about any of these changes or how they may impact you or your users, feel free to reach out.
-Christian Gauthier (christian.gauthier@pitt.edu)
I realize you probably don't maintain this code base much anymore... however, if you do...
I am a graduate student in Graham Hatfull’s lab at the University of Pittsburgh. Because PhamDB makes use of the Phamerator database scheme and parts of our workflow, I wanted to let you know about some changes we’re in the process of making to the Phamerator workflow.
The most important change is that we are ceasing use of kClust and HHsuite in favor of the newer, faster (still Söding-lab produced) MMseqs2. Our testing suggests that this program produces better phams than the iterative kClust approach, in a fraction of the time.
We will also be adding at least one column to the pham table. This column would contain the pham’s conservation score, as assessed by generating a Clustal Omega alignment and dividing the number of perfectly conserved residues by the length of the shortest gene in the pham (this means of calculating prevents draft-status genes, which are frequently called shorter than their manually annotated peers, from artificially lowering the conservation score of the pham). Of course the inclusion of this column means we will also be generating Clustal Omega alignments for each pham moving forward.
I don’t think it’s worth your time to worry about including the Clustal Omega portion of our modified workflow in the PhamDB workflow. However, given that MMseqs2 and kClust work quite differently and result in the production of quite different overall databases, it may be worth it for you or somebody else to update PhamDB to use MMseqs2 instead of kClust. This would likely result in less confusion downstream for Phamerator users trying to build their own databases and wondering why they’re not compatible with https://www.phamerator.org/ or why they can’t get their database clustered as well as we now can.
If you’d like more information about any of these changes or how they may impact you or your users, feel free to reach out.
-Christian Gauthier (christian.gauthier@pitt.edu)