Skip to content

uhrs-bioinfogo/Bioencoder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bioencoder

An amino acid sequence encoding toolbox for machine learning.

DOI

Introduction

main features:

  • Machine learning oriented
  • Rich encoding varieties (NUM, BE, EAAC, AAINDEX, GACC, CKSAAP)
  • Native support for big size fasta format
  • Out-of-the-box

Installation Tutorial

Via PIP

python setup.py bdist_wheel
pip install ./dist/bioencoder-1.0.0-py3-none-any.whl

Usage

Reading from a Fasta File

A standard fasta file like:

>1|1
DGMRITLRDGCIVHLRASGNAPELRCYAEANLLNRAQDLVNTTLANIKKRC
>2|1
EGKLSMLQNTIKRLASLSTEEPVVICNDRHRFLVAEQLREIDKLANNIILE

To read and process the sequences to EEAC Embedding:

from bioencoder import *
pos_data = "pos.fasta"
window_size = 12
pos_seqList,pos_labellist,pos_seqNamelist=get_data(pos_data,1,method="GAAC",window_size=window_size)

Reading from a raw sequence

For example, A Str likestr='DGMRITLRDGCIVHLRASGNAPELRCYAEANLLNRAQDLVNTTLANIKKRC', Using bellow code to encoding the sequence to EAAC Embeding:

from bioencoder.encoder import *
print(EAAC(seq,window=5))

About

An amino acid sequence encoding toolbox for machine learning.

Resources

Stars

Watchers

Forks

Contributors