An amino acid sequence encoding toolbox for machine learning.
main features:
- Machine learning oriented
- Rich encoding varieties (NUM, BE, EAAC, AAINDEX, GACC, CKSAAP)
- Native support for big size
fastaformat - Out-of-the-box
python setup.py bdist_wheel
pip install ./dist/bioencoder-1.0.0-py3-none-any.whlA standard fasta file like:
>1|1
DGMRITLRDGCIVHLRASGNAPELRCYAEANLLNRAQDLVNTTLANIKKRC
>2|1
EGKLSMLQNTIKRLASLSTEEPVVICNDRHRFLVAEQLREIDKLANNIILE
To read and process the sequences to EEAC Embedding:
from bioencoder import *
pos_data = "pos.fasta"
window_size = 12
pos_seqList,pos_labellist,pos_seqNamelist=get_data(pos_data,1,method="GAAC",window_size=window_size)For example, A Str likestr='DGMRITLRDGCIVHLRASGNAPELRCYAEANLLNRAQDLVNTTLANIKKRC', Using bellow code to encoding the sequence to EAAC Embeding:
from bioencoder.encoder import *
print(EAAC(seq,window=5))