A two-step pipeline to screen MAFFT alignment parameters and identify the optimal multiple sequence alignment for a given protein dataset.
Created by Nicolas-Frédéric Lipp, PhD.
MAFFT_ScoreNGo performs systematic exploration of MAFFT alignment parameters to find the best alignment strategy for your dataset. The workflow is split into two complementary scripts:
MAFFT_ScoreNGo.py— generates alignments across many MAFFT parameter combinations, with support for batch processing of multiple FASTA files.AlignmentScorerWithJalview.py— re-scores those alignments using rigorous, multi-metric evaluation (exact Jalview conservation and quality algorithms, Sum-of-Pairs, parsimony-informative sites, and more) to identify the optimal alignment.
This separation allows fast iteration on alignment generation while applying more sophisticated scoring methods as a distinct, reproducible step.
- Automated screening of MAFFT parameter combinations across three pre-defined levels (Light, Standard, Aggressive)
- Support for batch processing of multiple FASTA files in a single run
- Custom parameter injection
- Separate output directories per input file
- Comprehensive logging of MAFFT commands and execution
- Exact Jalview conservation (per-column 0–11 scores based on 10 physicochemical property classes, AMAS-derived scheme)
- Exact Jalview quality (per-column BLOSUM62-based quality)
- Sum-of-Pairs score (BLOSUM62)
- Hydrophobic position conservation
- Gap coherence
- Parsimony-informative sites
- Pairwise identity
- Three ranking methods aggregating all metrics: Weighted Average, TOPSIS, Borda Count
- Consensus best-alignment detection across ranking methods
- Parallel processing for large alignment sets
**MAFFT stands for Multiple sequence Alignment using Fast Fourier Transform. More documentation can be found at mafft.cbrc.jp, Katoh et al. (Nucleic Acids Res., 2002), and Katoh et al. (Brief. Bioinform., 2017).
MAFFT_ScoreNGo tests various combinations of the following MAFFT parameters:
- Alignment strategies (
--genafpair,--localpair,--globalpair) - Substitution matrices (BLOSUM62, BLOSUM80)
- Gap opening penalties
- Gap extension penalties
- Large gap penalties
- Tree iteration count
For a detailed explanation of the parameters tested and the screening levels, see PARAMETERS.md.
-
Clone this repository:
git clone https://github.com/NicoFrL/MAFFT_ScoreNGo.git cd MAFFT_ScoreNGo -
Install the required Python packages:
pip3 install -r requirements.txt -
Ensure MAFFT is installed and accessible from your command line (see Dependencies section).
python3 MAFFT_ScoreNGo.py
Follow the prompts to:
- Select one or more input FASTA files (multi-selection supported)
- Choose the screening level:
- Light (
1): Quick screening with fewer parameter combinations - Standard (
2): Balanced screening (recommended for most use cases) - Aggressive (
3): Thorough screening with many combinations
- Light (
- (Optional) Add custom MAFFT parameters to be tested alongside the predefined set
- Confirm and let the script run
For each input file, a folder named mafft_results_<basename>/ is created next to the input, containing:
alignment_<n>.fasta— one file per parameter combinationmafft_commands.txt— the exact MAFFT commands useddebug_logs.txt— execution times and stderr output
python3 AlignmentScorerWithJalview.py
When prompted:
- Select the folder containing the alignments produced by Step 1 (or a parent folder containing multiple such subfolders — the script will detect them automatically).
The scorer will evaluate every alignment_*.fasta file and produce:
alignment_scores_EXACT_JALVIEW.txt— ranked results with all metrics and consensus best alignmentper_position_scores.txt— per-column conservation and quality scores for each alignment
| File | Description |
|---|---|
alignment_<n>.fasta |
One alignment per MAFFT parameter combination |
mafft_commands.txt |
All MAFFT commands executed |
debug_logs.txt |
Execution logs and MAFFT stderr output |
| File | Description |
|---|---|
alignment_scores_EXACT_JALVIEW.txt |
Full scoring results, ranking by three methods, and consensus best alignment |
per_position_scores.txt |
Per-column conservation (0–11) and quality scores |
- Python 3.7 or later (3.9+ recommended)
- biopython >= 1.81
- numpy >= 1.24
- scipy >= 1.10
- tqdm >= 4.65
- tkinter (usually included with Python)
- MAFFT (must be installed separately and available in your system PATH)
mafft --version
If the command is not found, install MAFFT:
- macOS (with Homebrew):
brew install mafft - Ubuntu/Debian Linux:
sudo apt-get install mafft - Other systems: see https://mafft.cbrc.jp/alignment/software/
- macOS (Homebrew Python):
brew install python-tk - Ubuntu/Debian:
sudo apt-get install python3-tk
MAFFT_ScoreNGo.pyruntime scales linearly with the number of parameter combinations. Light screening typically completes in under a minute for ~70 sequences of ~400 residues; Aggressive can take significantly longer.AlignmentScorerWithJalview.pyuses parallel processing (one process per Performance core on Apple Silicon; bounded on other systems) and processes alignments in chunks of 50 to keep memory usage stable.- For very large datasets (>200 sequences), consider running on a server or workstation rather than a laptop.
- Don't forget to "caffeinate" your Mac during long runs (or use systemd-inhibit on Linux).
Example input files are provided in the examples/ directory. To reproduce the example workflow:
- Run
MAFFT_ScoreNGo.pyand selectexamples/sample_input.fasta. - Choose Light screening (fastest).
- After completion, run
AlignmentScorerWithJalview.pyand point it to the resultingmafft_results_sample_input/folder.
This project is distributed under a Custom Academic and Non-Commercial License. It is free to use for educational, research, and non-profit purposes. For commercial use, please refer to the LICENSE file or contact the author for more information.
Contributions are welcome. Please open an issue or submit a pull request on the GitHub repository.
If you encounter any problems or have questions, please open an issue on the GitHub repository.
Nicolas-Frédéric Lipp, PhD https://github.com/NicoFrL
The exact Jalview conservation and quality algorithms in AlignmentScorerWithJalview.py are re-implementations of the algorithms described in:
- Livingstone, C. D. & Barton, G. J. (1993) Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Comput. Appl. Biosci. 9, 745–756.
- Waterhouse, A. M., Procter, J. B., Martin, D. M. A., Clamp, M. & Barton, G. J. (2009) Jalview Version 2 — a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191.
This project was developed with assistance from AI language models for code structure and documentation. The scientific approach and core algorithm were designed and implemented by the author.