This project is a simple bioinformatics pipeline built in Google Colab for analyzing and comparing protein sequences. It’s designed to be easy to run and modify, especially for students or anyone getting started with protein analysis.
Given a set of protein sequences, the pipeline will:
- Read in your data (from a FASTA file or by fetching sequences)
- Calculate basic physicochemical properties
- Compare sequences to each other
- Show how the proteins are related using clustering
The notebook generates:
- tables of physicochemical properties
- a sequence similarity table
- clustering visualizations (dendrograms)
All results are displayed in Colab and can be saved.
- Open the notebook in Google Colab
- Run the setup cells
- Upload your FASTA file or fetch sequences
- Run the remaining cells in order
Feel free to modify or extend the pipeline.