Skip to content

Add code skeleton to output sequence of new codon around mutated site#50

Open
mmokrejs wants to merge 1 commit into
virus-evolution:masterfrom
mmokrejs:add_codon_usage
Open

Add code skeleton to output sequence of new codon around mutated site#50
mmokrejs wants to merge 1 commit into
virus-evolution:masterfrom
mmokrejs:add_codon_usage

Conversation

@mmokrejs

@mmokrejs mmokrejs commented Feb 9, 2024

Copy link
Copy Markdown

This roughly sketches places which need to be added to provide codon sequence and their frequency in the output. I never wrote anything in GO so this is incomplete. The pkg/variants/pairwise.go needs more work to figure out where is the current mutation located in respect to the codon and then slice out the three nucleotides including current position.

…site

This roughly sketches places which need to be added to provide codon sequence
and their frequency in the output. I never wrote anything in GO so this is an
incomplete. The pkg/variants/pairwise.go needs more work to figure out where
is the current mutation located in respect to the codon and then slice out the
three nucleotides including current position.
@mmokrejs

mmokrejs commented Feb 9, 2024

Copy link
Copy Markdown
Author

In overall, I am after getting something like codon:S:aag1249-1251ata and its frequency. If the are multiple mutations in the codons, then a list of such codons and their frequencies (does not need to be a list of all 64 codons, most of them would be zero). The gene name S as shown above is not much needed for me but maybe somebody else would appreciate that.

I think I would prefer simpler CSV/TSV file format than the default gofasta output but I wanted to keep it simple to add.

I am surprised that there is no other tool able to parse SAM/BAM or ALN formats and return a list of 64 -item arrays of the codons for every triplet in the alignment columns, properly skipping indels conflicting the supposedly error-free reference (in other words, complying the GFF3 annotation).

Respecting an in-frame insertion or deletion in the sample spanning 3 or 6 and so on nucleotides, would be even better but ATM I do not know how to tackle that.

Somewhat similar tool is:
http://emboss.open-bio.org/rel/rel6/apps/cusp.html but that calculates the codon usage across whole protein but what I am after is to get that codon-wise along whole protein for each aminoacid position.
http://emboss.open-bio.org/rel/rel6/apps/cai.html
http://emboss.open-bio.org/rel/rel6/apps/chips.html
http://emboss.open-bio.org/rel/rel6/apps/codcmp.html
http://emboss.open-bio.org/rel/rel6/apps/codcopy.html

http://atgme.org/
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0743-5

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-13-43

https://academic.oup.com/nar/article/48/19/11030/5921303

https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-022-08635-0

https://microbialcellfactories.biomedcentral.com/articles/10.1186/s12934-023-02230-y

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02966-1

https://www.mdpi.com/2073-4425/13/6/1090

https://bioinformatics.stackexchange.com/questions/997/which-sequence-alignment-tools-support-codon-alignment

https://github.com/paulstothard/sequence_manipulation_suite/blob/master/docs/pairwise_align_codons.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant