Opsin Phenotype Tool for Inference of Color Sensitivity (OPTICS) [v1.3] -

Code & Data: / OPTICS_v1.3.1: / VPOD_v1.3.1:

Opsin Phenotype Tool for Inference of Color Sensitivity (OPTICS) [v1.3] -

Note - A simpler version of an intro to OPTICS is also available on our organization github.io page -> here

Example Box Plot Output for Bootstrap Predictions of Opsin λmax by OPTICS

Description

OPTICS is an open-source tool that uses machine learning (ML) models to predict Opsin Phenotype (λmax) from unaligned opsin amino-acid sequences.
OPTICS leverages machine learning models trained on opsin genotype-phenotype data from the Visual Physiology Opsin Database (VPOD).
OPTICS allows for structural mapping of sequence features important to model prediction (using SHAP), translating machine learning insights directly onto 3D protein structures (PDB).
OPTICS can be downloaded and used as a command-line or GUI tool.
OPTICS is also available as an online tool here, hosted on our Galaxy Project server.
Check out our pre-print Accessible and Robust Machine Learning Approaches to Improve the Opsin Genotype-Phenotype Map to read more about it!

Key Features

λmax Prediction: Predicts the peak light absorption wavelength (λmax) for opsin proteins.
Model Selection: Choose from different pre-trained models for prediction.
BLAST Analysis: Optionally perform BLASTp analysis to compare query sequences against reference datasets.
Bootstrap Predictions: Enable bootstrap predictions for enhanced accuracy assessment with confidence intervals.
Prediction Explanation (SHAP): Explains the key sequence features driving model predictions of λmax. This feature also allows for all-to-all pairwise comparisons of the features driving differences in predicted λmax between sequences using SHAP values.
Structure Mapping: Project SHAP importance values onto 3D PDB structures to create "importance heatmaps."
Custom Structure Annotation: Visualize custom annotations on 3D structures using automated PyMOL or ChimeraX scripting.
Mutagenesis Tools: Generate site-directed mutant libraries, in-silico deep mutational scanning libraries, reciprocal mutants, and chimeric opsin sequences.
Direct Mutagenesis-to-Prediction Pipeline: In the GUI, generated mutant/chimera libraries can be passed directly into OPTICS λmax prediction workflows.

Installation

Clone the repository:

 git clone https://github.com/VisualPhysiologyDB/optics.git

Install dependencies: [Make sure you are working in the repository directory from here-after]

A. Create a Conda environment for OPTICS (make sure you have Conda installed)
```
conda create --name optics_env python=3.11
```
THEN
```
conda activate optics_env
```
B. Use the 'requirements.txt' file to download base package dependencies for OPTICS
```
pip install -r requirements.txt
```
C. Download MAFFT and BLAST

IF working on MAC or LINUX device:
- Install BLAST and MAFFT directly from the bioconda channel
```
conda install bioconda::blast bioconda::mafft
```
IF working on WINDOWS device:
- Manually install the Windows compatible BLAST executable on your system PATH.
- DO NOT need to download MAFFT; OPTICS includes a Windows-compatible version in the optics_scripts/mafft folder that it will try to use automatically.
- You can download your own version of MAFFT but it must be executable on your system path.

Data File Structure

OPTICS relies on a specific directory structure to locate models, alignment files, and cache data. When you clone the repository, the structure should generally look like this:

optics/
├── data/
│   ├── fasta/              # Alignment files for each model version (e.g., vpod_1.3)
│   ├── blast_dbs/          # BLAST databases for sequence identity checks
│   ├── aa_property_index/  # AA property values used for feature encoding
│   ├── importance_reports/ # Feature importance data & site translation information (Feature Name -> True Position)
│   ├── cached_structures/  # Stores downloaded PDB files (e.g., 1U19.pdb)
│   ├── cached_predictions/ # Stores previous predictions (JSON) to speed up runtime
│   ├── cached_seqs/        # Stores fetched WT/reference sequences used by mutagenesis tools
|   └── cached_blastp_analysis/ # Stores data from previous runs of BLASTp (JSON) to speed up runtime
├── models/
│   ├── reg_models/         # Regression models (XGBoost/GradientBoosting) for point predictions
│   └── bs_models/          # Bootstrap model ensembles for confidence intervals
├── optics_scripts/         # Helper modules (utils, blast, bootstrap, maft wrappers, etc.)
├── deepBreaks/             # A key component of the OPTICS pipeline, this folder must stay here
└── prediction_outputs/     # Default output directory for all runs

Note: The cached_predictions folder allows OPTICS to skip re-running heavy alignment/prediction steps for sequences it has seen before. You can clear this folder to force a fresh run.

Usage

MAKE SURE YOU HAVE ALL DEPENDENCIES DOWNLOADED AND THAT YOU ARE IN THE FOLDER DIRECTORY FOR OPTICS (or have loaded it as a module) BEFORE RUNNING ANY SCRIPTS!

1. λmax Prediction (`optics_predictions.py`)

The main script for generating λmax predictions.

Required Args:

  -i, --input: Either a single sequence or a path to a FASTA file.

General Optional Args:

  -o, --output_dir: Desired directory to save output folder/files (optional). Default: './prediction_outputs'

  -p, --prediction_prefix: Base filename for prediction outputs. Default: 'unnamed'

  -v, --model_version: Version of models to use (optional). Based on the version of VPOD used to train models. Options/Default: vpod_1.3 (More version coming later)

  -m, --model: Prediction model to use. Options: whole-dataset, wildtype, vertebrate, invertebrate, wildtype-vert, type-one, whole-dataset-mnm, wildtype-mnm, vertebrate-mnm, invertebrate-mnm, wildtype-vert-mnm. **Default: whole-dataset** 

  -e, --encoding: Encoding method to use (optional). Options: one_hot, aa_prop. Default: aa_prop

  --tolerate_non_standard_aa: Allows OPTICS to run predictions on sequences with 'non-standard' amino-acids (e.g. - 'X','O','B', etc...)(optional). Default: True

  --tolerate_incomplete_seqs: Allows OPTICS to run predictions on sequences outside the predefined limits of 250-650 amino-acids. (optional) Default: False 
                              NOTE - if you enable this option, then you may have predictions on incomplete sequences, which should be treated as less accurate.

  --n_jobs: Number of parallel processes to run (optional). -1 is the default, utilizing all avaiable processors.


BLASTp Analysis Args (optional):

  --blastp: Enable BLASTp analysis.

  --blastp_report: Filename for BLASTp report. Default: blastp_report.txt

  --refseq: Reference sequence used for blastp analysis. Options: bovine, squid, microbe, custom. Default: bovine

  --custom_ref_file: Path to a custom reference sequence file for BLASTp.  Required if --refseq custom is selected.

Bootstrap Analysis Args (optional):

  --bootstrap: Enable bootstrap predictions.

  --visualize_bootstrap: Enable visualization of bootstrap predictions.

  --bootstrap_num: Number of bootstrap models to load for prediction replicates. Default // Maximum: 100

  --bootstrap_viz_file: Filename prefix for bootstrap visualization. Default: bootstrap_viz

  --save_viz_as: File type for bootstrap visualizations. Options: svg, png, or pdf Default: svg
  
  --full_spectrum_xaxis: Enables visualization of predictions on a full spectrum x-axis (300-650nm). Otherwise, x-axis is scaled with predictions.

Example Command:

  python optics_predictions.py -i ./examples/optics_ex_short.txt -o ./examples -p ex_predictions -m whole-dataset -e aa_prop --blastp --blastp_report blastp_report_ex --refseq squid --bootstrap --visualize_bootstrap --bootstrap_viz_file bootstrap_viz --save_viz_as svg

Input

Unaligned FASTA file containing opsin amino-acid sequences.

Example FASTA Entry:

  >NP_001014890.1_rhodopsin_Bos_taurus
  MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRT 
  PLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVC 
  KPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVV 
  HFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQG 
  SDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTTLCCGKNPLGDDEASTTVSKTETSQVAPA

Output

Predictions (TSV, Excel): λmax values, BLASTp information, hex-codes (colors) corresponding to predicted λmax.
BLAST Results (TXT, optional): Comparison of query sequences to reference datasets.
Bootstrap Graphs (PDF, optional): Visualization of bootstrap prediction results.
Job Log (TXT): Log file containing input command to OPTICS, including encoding method and model used.
iTol & FigTree Annotation Files (TXT): Annotations for visualizing the λmax of opsins, with the hex-codes (colors) corresponding to predicted λmax.

Note - All outputs are written into subfolders generated based on your 'prediction-prefix' under your specified output directory, and are marked by time and date.

2. Explaining Model Predictions with SHAP (`optics_shap.py`)

For users interested in the "nitty-gritty" of why sequences have different predicted λmax values, we provide a specialized script that uses SHAP (SHapley Additive exPlanations).

This tool generates detailed plots and reports that attribute the difference in prediction to specific features (i.e., amino acid sites and their properties).

Example SHAP plot for explaining individual predictions of opsin λmax by OPTICS

Example SHAP comparison plot for explaining pair-wise differences in predictions of opsin λmax by OPTICS

This script requires a FASTA file

File must contain at least two or more sequences if you are running a SHAP comparison.
Only a single sequence is needed for an individual SHAP explanation

Required Args:
  -i, --input: Path to FASTA file (must contain at least 2 sequences for comparison mode).

Optional Args:
  -o, --output_dir: Directory to save the SHAP analysis output folder.

  -p, --prediction_prefix: Base filename for the SHAP plot and data files.

  --mode: Analysis mode - 'single' (generate SHAP explanation for any number of individual sequences), 'comparison' (generate SHAP explanation for pairwise predictionn difference between any number of sequences), or 'both'. Default: 'both'

  -m, --model: Prediction model to use.

  -v, --model_version: Version of models to use (optional). Based on the version of VPOD used to train models. Options/Default: vpod_1.3 (More version coming later)

  -e, --encoding: Encoding method to use.

  --n_positions: Number of positions to show on SHAP explanation graphs. Default: 10 (to limit noisiness)

  --save_viz_as: File type for the SHAP visualization (svg, png, or pdf). Default: svg

  --use_reference_sites : Enable to use reference site numbering (i.e. - Bovine or Squid Rhodopsin), instead of feature names.

  --n_jobs: Number of parallel processes to run (optional). -1 is the default, utilizing all avaiable processors.,

Example Command:

python optics_shap.py -i ./examples/optics_ex_short.fasta -o ./examples -p short_ex_test_aa_prop --mode both --use_reference_sites

Output

Predictions (TSV): Single (non-bootstrapped) λmax values
SHAP Explanation Data (CSV): SHAP data for individual sequence explanations and/or pairwise SHAP comparisons for all sequences.
SHAP Graphs (SVG): Visualizations for individual sequence SHAP explanations and/or pairwise SHAP comparisons for all sequences.
Difference Matrix (CSV & Excel): An all-to-all λmax difference matrix for query sequences. The Excel version has a built in 'heat-map'.
Job Log (TXT): Log file containing input command to OPTICS, including encoding method and model used.

WARNING - Be cautious if you choose the 'comparison' or 'both' mode for SHAP with many sequences. Too many sequence can end up generating hundreds-to-thousands of comparison files (raw data and visualizations).

3. Mapping SHAP Importance to 3D Structure (`optics_structure_map.py`)

Example output with SHAP values mapped to structure by OPTICS

This script takes the output CSVs from the SHAP analysis script (both individual explanations AND pairwise comparison differences) and maps the importance values onto a 3D protein structure (PDB).

It modifies the B-factor column of the PDB file, allowing you to visualize "importance" as a heat map (Blue=Low, Orange=High importance). For comparison outputs, it maps the absolute difference in SHAP values to highlight the regions most responsible for the divergence in predicted λmax.

Required Args:
  -s, --shap_csv: Path to the SHAP analysis CSV file generated by optics_shap.py.

Optional Args:
  -p, --pdb_file: Path to PDB file(s) or ID(s). Can provide two comma-separated paths (e.g., struct1.pdb,struct2.pdb) if mapping 'both' sequences from a comparison CSV. Default: 1U19 (Bovine Rhodopsin).

  -o, --output_dir: Output directory.

  --chain: Chain ID to map to. Default: A.

  --use_query_position: Check this if your CSV uses target sequence numbering rather than reference numbering.

  --comp_target: If mapping a pairwise comparison CSV using query positions, select which sequence's numbering to map ('1', '2', or 'both'). Default: 1.

  --map_bovine_also: If using a custom PDB, this flag forces a second output mapped to Bovine Rhodopsin (1U19) for comparison.

  --top_n_labels: Number of top SHAP sites to automatically label in the generated visualization script. Default: 10.

  --software: The target software for the visualization script output ('pymol' or 'chimerax'). Default: 'chimerax'.

Example Command #1: SHAP Importance Structure Mapping For Single Sequence (no comparison metrics)

python optics_structure_map.py -s ./examples/optics_shap_on_structure_map_ex_2026-03-10_13-19-02/C_phantasticus_LWS1_shap_analysis.csv -p ./examples/ex_structures/C_phantasticus_LWS1_esmfold.pdb --top_n_labels 10 --software chimerax --map_bovine_also

Example Command #2: SHAP Comparison Mapping on Both Sequences with Top 5 Positions Labeled

python optics_structure_map.py -s ./examples/optics_shap_on_structure_map_ex_2026-03-10_13-19-02/C_phantasticus_LWS1_vs_C_phantasticus_LWS2_shap_data.csv -p ./examples/ex_structures/C_phantasticus_LWS1_esmfold.pdb,./examples/ex_structures/C_phantasticus_LWS2_esmfold.pdb --use_query_position --comp_target both --top_n_labels 10 --software chimerax

Output

SHAP Annotated Structure File (PDB): Generates a .pdb file with importance (or difference) scores in the B-factor column. If comp_target is both, generates two PDB files using the sequence numbering schemes and provided PDB templates.
Visualization Script (.pml or .cxc): Generates a PyMOL or ChimeraX script designed to automatically color the heat-map properly and display text labels over the top SHAP sites.

4. Generate Custom Structure Annotations (`optics_structure_annotations.py`)

A general-purpose tool to visualize arbitrary annotations (e.g., mutation sites, binding pockets) on a structure. It takes a simple CSV and creates a runnable visualization script.

Required Args:
  -a, --annotation_file: CSV/TSV file with columns: 'position' (required), 'color' (optional), 'style' (optional), 'label' (optional).

Optional Args:
  -p, --pdb: PDB ID or path. Default: 1U19.

  -o, --output_dir. Default: '.'

  --software: Target visualization software ('pymol' or 'chimerax'). Default: 'chimerax'.

  --chain: Chain identifier. Default: 'A'

Example Command:

python optics_structure_annotations.py -a ./examples/optics_custom_annotations_ex.csv -p 1U19 --software chimerax

Output

Custom Annotation Script (ChimeraX or PyMol): Generates a ChimeraX or PyMol specific visualization script. Typically you can just open these if your protein structure of interest is in the same folder.

5. Site-Directed Mutagenesis (`optics_scripts/mutagenesis.py`)

A general-purpose tool for generating in-silico point mutants from wild-type opsin sequences.

This script can:

Generate all combinatorial mutants from a WT accession and a comma-separated list of mutations.
Generate one sequence with several specified mutations applied together.
Generate sequences from a file of pre-defined mutant accession strings.
Optionally include the WT sequence in the output.

Mutation positions are interpreted relative to a user-specified reference sequence, then mapped onto the target sequence by pairwise protein alignment. This helps keep numbering consistent when the target sequence has insertions/deletions relative to the reference.

Input Mode Args (choose one):

  --wt_accession: Wild-type accession/name to mutate. Requires --mutations.

  --mutant_accession: Single mutant string to generate, formatted as WT_A123G,F45S.

  --mutant_file: Text file containing mutant strings, one per line.

Mutation Args:

  --mutations: Comma-separated list of mutations to use with --wt_accession
               (e.g., A116S,S119A,G121A).

General Args:

  -o, --output_file: Path to save the generated mutant sequence file. Required.

  -ra, --reference_accession: Reference accession used for sequence numbering.
                              Default: NM_001014890 (Bos taurus rh1).

  --db_preference: NCBI database to search first. Options: nucleotide, protein.
                   Default: nucleotide, with fallback to protein.

  --output_format: Output file type. Options: fasta, tsv. Default: fasta

  --no_wt: Prevent wild-type sequences from being included in the output.

  --email: Email address for NCBI Entrez queries.

Example Command #1: Generate All Combinatorial Mutants

python optics_scripts/mutagenesis.py --wt_accession AncBovine --mutations "A116S,S119A,G121A" -o ./examples/combined_mutants.fasta -ra NM_001014890

Example Command #2: Generate One Multi-Mutant Sequence

python optics_scripts/mutagenesis.py --mutant_accession "AncBovine_A116S,G121A" -o ./examples/single_mutant.fasta -ra NM_001014890

Input

WT accession/name plus a mutation list, OR a mutant accession string/file.
Mutant strings should follow the format:
```
WT_ACCESSION_A123G,F45S
```
Mutations should follow the standard original-AA / reference-position / new-AA format (e.g., A123G).

Output

Mutant Sequence File (FASTA or TSV): Generated WT and/or mutant sequences.
Sequence Cache (JSON): Fetched WT/reference sequences are stored in data/cached_seqs/ to speed up later runs.

6. In-Silico Deep Mutational Scanning (`optics_scripts/in_silico_dms.py`)

This tool generates site-saturated in-silico mutant libraries for selected opsin sites.

For each requested site, OPTICS maps the position onto the WT sequence using a reference-guided alignment and creates all alternate standard amino-acid substitutions at that position. The WT residue is not duplicated in the output.

Required Args:

  --wt_accession: Accession/name for the wild-type sequence to mutate.

  --sites: Comma-separated list of sites to scan (e.g., S121,A185,G203).

  -o, --output_file: Path to save the generated DMS library FASTA file.

General Optional Args:

  -ra, --reference_accession: Reference accession used for sequence numbering.
                              Default: NM_001014890 (Bos taurus rh1).

  --db_preference: NCBI database to search first. Options: nucleotide, protein.
                   Default: nucleotide, with fallback to protein.

  --email: Email address for NCBI Entrez queries.

Example Command:

python optics_scripts/in_silico_dms.py --wt_accession AncRho1 --sites "S121,A185,G203" -o ./examples/dms_library.fasta -ra NM_001014890

Input

WT accession/name.
A comma-separated set of target sites using original amino acid and reference position (e.g., S121,A185,G203).

Output

DMS Library (FASTA): One mutant sequence per alternate amino-acid substitution at each requested site.

7. Reciprocal Mutagenesis (`optics_scripts/reciprocal_mutagenesis.py`)

This tool creates reciprocal single-mutant sequences between two aligned opsins.

It takes a FASTA alignment containing exactly three sequences:

Reference sequence used for positional numbering.
First target sequence.
Second target sequence.

OPTICS compares sequences 2 and 3 at non-gap aligned positions. Wherever they differ, it generates one mutant that changes sequence 2 to match sequence 3 at that site, and one mutant that changes sequence 3 to match sequence 2.

Required Args:

  input_file: Aligned FASTA file containing exactly three sequences.

  output_file: Path to save the reciprocal mutant FASTA file.

Example Command:

python optics_scripts/reciprocal_mutagenesis.py ./examples/three_opsins_aligned.fasta ./examples/reciprocal_mutants.fasta

Input

Aligned FASTA with exactly three protein sequences.
Sequence 1 provides reference numbering for mutation names.
Sequences 2 and 3 are compared and reciprocally mutated.

Output

Reciprocal Mutant FASTA: Includes the three original ungapped sequences plus all reciprocal single mutants.

8. Chimera Construction (`optics_scripts/chimeras.py`)

This tool builds chimeric opsin sequences by stitching together reference-numbered segments from one or more source sequences. Optional point mutations can also be applied after the chimera is assembled.

The chimera definition string uses the following format:

Acc1_Start1_End1-Acc2_Start2_End2[Mutation1,Mutation2,...]

Segments are separated by -.
Each segment uses Accession_Start_End.
Start/end coordinates are interpreted relative to the reference sequence.
Optional point mutations are added at the end in brackets.

Required Args:

  -co, --chimera: Chimera definition string.

  -o, --output_file: Path to save the generated chimera FASTA file.

General Optional Args:

  -ra, --reference_accession: Reference accession used for sequence numbering.
                              Default: NM_001014890 (Bos taurus rh1).

  --db_preference: NCBI database to search first. Options: nucleotide, protein.
                   Default: nucleotide, with fallback to protein.

  --email: Email address for NCBI Entrez queries.

Example Command:

python optics_scripts/chimeras.py --chimera "AncSW1_1_150-AncSW2_151_348[C203A,F204Y]" -o ./examples/my_chimera.fasta -ra NM_001014890

Input

A chimera string describing the source accession/name and reference-numbered coordinates for each segment.
Optional point mutations in standard format (e.g., C203A).

Output

Chimera FASTA: A single generated chimeric protein sequence.

Mutagenesis-to-Prediction Pipeline in the GUI

All four mutagenesis tools are also available in the OPTICS GUI. Each mutagenesis mode includes a Directly Run OPTICS Predictions on Mutant Sequences option.

When enabled, the GUI writes the generated sequence library as FASTA and immediately passes it to the standard OPTICS prediction workflow. Users can choose the prediction model, encoding method, and whether to run bootstrap predictions.

9. Using the OPTICS GUI

That's right! No-need for command line, OPTICS can also be used as a GUI! The usage is quite simple, just use the command below (with your OPTICS conda enviornment activated) and get to predicting. ;)

To run the GUI:

python run_optics_gui.py

The GUI provides tabs/buttons for the main OPTICS analysis pipelines and the mutagenesis tools:

Standard Predictions: Run the main λmax prediction workflow.
SHAP Interpretation: Run feature attribution analysis.
Structure Mapping: Map SHAP values to PDB files.
Structure Annotations: Visualize custom data on structures.
Site-Directed Mutagenesis: Generate single, combined, or combinatorial point-mutant sequences.
Deep Mutational Scanning: Generate site-saturated mutant libraries for selected sites.
Reciprocal Mutagenesis: Swap differing residues between two aligned sequences.
Chimera Construction: Stitch reference-numbered sequence segments together and optionally add point mutations.

Understanding the λmax Prediction Models

The --model flag allows you to select a specific pre-trained model. Each model is named after the data-subset it was trained on.

To keep the base installation lightweight, models are divided into Core and Extra categories.

Core Models (Included by Default)

These models are included out-of-the-box when you clone this repository:

whole-dataset: Trained on the entire VPOD dataset. Recommended.
whole-dataset-mnm: Trained on the entire dataset including "Mine-n-Match" inferred data.
wildtype: Trained exclusively on wild-type sequences.
wildtype-mnm: Trained on wild-type sequences including "Mine-n-Match" inferred data.
type-one: Trained on the Type-One (Microbial) opsin dataset (previously published by Karyasuyama et al. 2018)

Extra Models (Requires Separate Download)

We also offer specialized taxonomic and mutational subset models. Because of file size constraints, these are hosted in a separate repository.

vertebrate & vertebrate-mnm
invertebrate & invertebrate-mnm
wildtype-vert & wildtype-vert-mnm
wildtype-mut

📥 How to get Extra Models: To use any of the extra models, please visit the Extra OPTICS Models Repository. Download the required .pkl files and place them in your local models/reg_models/ and models/bs_models/ directories as instructed there.

The `-mnm` Suffix

Models ending in -mnm (e.g., wildtype-mnm) are trained on augmented datasets.

Standard models: Trained exclusively on heterologous expression data (in-vitro).
-mnm models: Trained on heterologous data plus data inferred via our "Mine-n-Match" procedure (in-vivo correlations). See Frazer et al. 2025 for details.
Note - These models should be treated as secondary to heterolgous models for now. The heterolgous models are the 'gold-standard' and MNM models are the 'silver-standard'. Still useful, but not equal.

License

All data and code is covered under a GNU General Public License (GPL)(Version 3), in accordance with Open Source Initiative (OSI)-policies

Citation

IF citing this GitHub and its contents use the following DOI provided by Zenodo...
```
10.5281/zenodo.10667840
```

IF you use OPTICS in your research, please cite the following paper(s):

Our more recent publication directly on the making/utility of OPTICS.

Seth A. Frazer, Todd H. Oakley. Accessible and Robust Machine Learning Approaches to Improve the Opsin Genotype-Phenotype Map. bioRxiv, 2025.08.22.671864. https://doi.org/10.1101/2025.08.22.671864

Our original paper on the development of VPOD; the opsin genotype-phenotype database backbone for training the ML models used in OPTICS.

Seth A. Frazer, Mahdi Baghbanzadeh, Ali Rahnavard, Keith A. Crandall, & Todd H Oakley. Discovering genotype-phenotype relationships with machine learning and the Visual Physiology Opsin Database (VPOD). GigaScience, 2024.09.01. https://doi.org/10.1093/gigascience/giae073

Contact

Contact information for author questions or feedback.

Todd H. Oakley - ORCID ID

oakley@ucsb.edu

Seth A. Frazer - ORCID ID

sethfrazer@ucsb.edu

Additional Notes/Resources

Want to use OPTICS without the hassle of the setup? -> CLICK HERE to visit our Galaxy Project server and use our tool!
OPTICS v1.3 uses VPOD_v1.3 for training.
Here is a link to a bibliography of the publications used to build VPOD_v1.3
If you know of publications for training opsin ML models not included in the VPOD_v1.2 database, please send them to us through this form
Check out the VPOD GitHub repository to learn more about our database and ML models!

Name		Name	Last commit message	Last commit date
Latest commit History 268 Commits
data		data
deepBreaks		deepBreaks
examples		examples
models		models
optics_scripts		optics_scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
AUTHORS.txt		AUTHORS.txt
LICENSE.txt		LICENSE.txt
README.md		README.md
example_commands.txt		example_commands.txt
optics_predictions.py		optics_predictions.py
optics_shap.py		optics_shap.py
optics_structure_annotations.py		optics_structure_annotations.py
optics_structure_map.py		optics_structure_map.py
requirements.txt		requirements.txt
run_optics_gui.py		run_optics_gui.py

Folders and files

Latest commit

History

Repository files navigation

Opsin Phenotype Tool for Inference of Color Sensitivity (OPTICS) [v1.3] -

Description

Key Features

Table of Contents

Installation

THEN

Data File Structure

Usage

1. λmax Prediction (optics_predictions.py)

Input

Output

2. Explaining Model Predictions with SHAP (optics_shap.py)

Output

3. Mapping SHAP Importance to 3D Structure (optics_structure_map.py)

Output

4. Generate Custom Structure Annotations (optics_structure_annotations.py)

Output

5. Site-Directed Mutagenesis (optics_scripts/mutagenesis.py)

Input

Output

6. In-Silico Deep Mutational Scanning (optics_scripts/in_silico_dms.py)

Input

Output

7. Reciprocal Mutagenesis (optics_scripts/reciprocal_mutagenesis.py)

Input

Output

8. Chimera Construction (optics_scripts/chimeras.py)

Input

Output

Mutagenesis-to-Prediction Pipeline in the GUI

9. Using the OPTICS GUI

Understanding the λmax Prediction Models

Core Models (Included by Default)

Extra Models (Requires Separate Download)

The -mnm Suffix

License

Citation

Contact

Additional Notes/Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. λmax Prediction (`optics_predictions.py`)

2. Explaining Model Predictions with SHAP (`optics_shap.py`)

3. Mapping SHAP Importance to 3D Structure (`optics_structure_map.py`)

4. Generate Custom Structure Annotations (`optics_structure_annotations.py`)

5. Site-Directed Mutagenesis (`optics_scripts/mutagenesis.py`)

6. In-Silico Deep Mutational Scanning (`optics_scripts/in_silico_dms.py`)

7. Reciprocal Mutagenesis (`optics_scripts/reciprocal_mutagenesis.py`)

8. Chimera Construction (`optics_scripts/chimeras.py`)

The `-mnm` Suffix

Packages