Protein Design Hub

Protein Design Hub is an integrated platform for protein structure prediction, evaluation, mutagenesis, and LLM-guided scientific interpretation.

It combines:

computational predictors and quality metrics,
a 5-step deterministic pipeline or a 12-step LLM-guided agent pipeline,
a Streamlit UI with workflow-connected pages,
and specialist scientist agents that review and interpret each stage.

See AGENTS.md for extended agent API examples.

Core Capabilities

Multi-predictor structure generation: ColabFold, Chai-1, Boltz-2, ESMFold variants, ESM3, and optional ImmuneBuilder flows.
Deep structural evaluation: lDDT, TM-score, RMSD, QS-score, clash metrics, SASA/interface metrics, VoroMQA/CAD, OpenMM GBSA, and optional advanced metrics.
Agent orchestration:
- Step-only mode: fast compute pipeline.
- LLM-guided mode: meetings + verdicts + policy gating.
Mutagenesis workflow:
- Baseline predictor benchmarking.
- Baseline structure evaluation before mutation.
- Expert recommendation of mutation targets.
- Saturation and multi-mutation pipelines with optional OpenStructure comprehensive mutant-vs-baseline scoring.
Full web workspace: Predict, Evaluate, Compare, Agents, Editor, Mutagenesis, Evolution, MPNN, Batch, MSA, Jobs, Settings.

Pipeline Modes

Step-only Pipeline (5 steps)

Input -> Prediction -> Evaluation -> Comparison -> Report

LLM-guided Pipeline (12 steps)

Input -> Input Review -> Planning -> Prediction -> Prediction Review -> Evaluation -> Comparison -> Evaluation Review -> Refinement Review -> Mutagenesis Planning -> Executive Summary -> Report

LLM steps emit structured verdicts (PASS, WARN, FAIL) into step_verdicts.json. By default, a FAIL verdict halts the pipeline. Use --allow-fail-verdicts to continue and record an override event in policy_log.json.

Scientist Agents

Agent	Core expertise	Primary role in workflow
Principal Investigator	Project leadership, ML for structural biology, CASP standards	Leads meetings, aligns strategy, synthesizes decisions
Scientific Critic	Rigor, reproducibility, failure-mode analysis	Challenges assumptions, flags weak evidence and risks
Structural Biologist	3D structure interpretation, domains, interfaces, validation	Interprets structural plausibility and functional geometry
Computational Biologist	Pipeline setup, MSA strategy, large-scale workflows	Tunes predictor setup and throughput/reproducibility tradeoffs
Machine Learning Specialist	AF/ESM/diffusion/inverse-folding model behavior	Selects and calibrates model usage and confidence interpretation
Immunologist	Antibody/nanobody engineering and interface biology	Guides immune-specific structure and mutation decisions
Protein Engineer	Stability/function engineering and mutational strategy	Proposes residue targets and mutation library strategy
Biophysicist	Energetics, solubility, assay planning	Interprets energy/quality metrics and experimental readiness
Digital Recep	Refinement methods (ReFOLD, AMBER, FastRelax, etc.)	Recommends targeted refinement plans and safeguards fold integrity
Liam	QA specialist (ModFOLD/ModFOLDdock/MultiFOLD/IntFOLD suite)	Performs independent quality assessment and confidence triage

Team Presets

Team key	Composition focus
`default`	General prediction and evaluation
`design`	Rational design and engineering
`nanobody`	Antibody/nanobody development
`evaluation`	Quality and biophysical assessment
`refinement`	Structure refinement strategy
`mutagenesis`	Mutation scanning and design strategy
`mpnn_design`	Inverse folding / sequence design
`full_pipeline`	End-to-end core expert review
`all_experts`	Comprehensive review using all scientist personas

Installation

Prerequisites

Python >=3.10
CUDA GPU recommended for predictors and local LLM speed
Optional: Conda for OpenStructure installs

Quick Setup

git clone https://github.com/recep2244/pdhub.git
cd pdhub

# Option A: environment file
conda env create -f environment.yaml
conda activate protein_design_hub

# Option B: existing environment
pip install -e .

Install predictors as needed:

pdhub install all
# or targeted
pdhub install predictor colabfold
pdhub install predictor chai1
pdhub install predictor boltz2

LLM Backend (Default: Qwen on Ollama)

The default local provider is ollama with model qwen2.5:14b.

ollama pull qwen2.5:14b
ollama serve
pdhub agents status

GPU validation for Ollama:

ollama ps
journalctl -u ollama -n 120 --no-pager | rg -i "gpu|cuda|backend|vram"

Quick Start

# System checks
pdhub status
pdhub pipeline status

# Step-only (fast)
pdhub pipeline run input.fasta

# Full LLM-guided
pdhub pipeline run input.fasta --llm

# Allow continuation even if an LLM verdict is FAIL
pdhub pipeline run input.fasta --llm --allow-fail-verdicts

# Provider/model override at runtime
pdhub pipeline run input.fasta --llm --provider deepseek
pdhub pipeline run input.fasta --llm --provider gemini --model gemini-2.5-flash

# Dry-run pipeline shape
pdhub pipeline plan input.fasta
pdhub pipeline plan input.fasta --llm

# LLM meeting tools
pdhub agents list
pdhub agents meet "Which predictor is best for this sequence?"

# Web UI
pdhub web --host localhost --port 8501
# then open http://localhost:8501 in your browser

Web UI Modules

Main pages:

Home
Predict
Evaluate
Compare
Agents
Editor
Mutagenesis
Evolution
MPNN Lab
Batch
MSA
Jobs
Settings

The UI is cross-linked so outputs from prediction/evaluation/mutagenesis can be reused across pages.

Mutagenesis Workflow

The mutagenesis page (src/protein_design_hub/web/pages/10_mutation_scanner.py) supports:

Baseline comparison across selected predictors.
Baseline no-reference evaluation before mutation selection.
Agent discussion for:
- residue targeting after baseline prediction,
- baseline metric interpretation,
- post-scan interpretation and reporting.
Single-position saturation mutagenesis.
Multi-mutation combination search.
Optional per-mutant extended metrics and OpenStructure comprehensive mutant-vs-baseline comparison.
Saved scan jobs with provenance and meeting transcripts.

CLI Reference

Primary commands:

pdhub pipeline run: unified step-only or LLM-guided pipeline.
pdhub pipeline plan: dry-run pipeline stages.
pdhub pipeline status: predictor + LLM + GPU checks.
pdhub agents run: shortcut for pdhub pipeline run --llm.
pdhub agents meet: ad-hoc team or individual meetings.
pdhub compare run: legacy monolithic or agent-based comparison modes.
pdhub predict run, pdhub evaluate run, pdhub design, pdhub energy, pdhub backbone: focused workflows.

Python API

Step-only:

from pathlib import Path
from protein_design_hub.agents import AgentOrchestrator

orchestrator = AgentOrchestrator(mode="step")
result = orchestrator.run(input_path=Path("input.fasta"))
print(result.success, result.message)

LLM-guided with policy override control:

from pathlib import Path
from protein_design_hub.agents import AgentOrchestrator

orchestrator = AgentOrchestrator(
    mode="llm",
    num_rounds=1,
    allow_failed_llm_steps=False,  # default safety policy
)
result = orchestrator.run(
    input_path=Path("input.fasta"),
    reference_path=Path("native.pdb"),
)
if result.success:
    ctx = result.context
    print(ctx.comparison_result.best_predictor)
    print(ctx.step_verdicts.keys())

LLM Providers

Built-in provider presets in src/protein_design_hub/core/config.py:

Provider	Type	Default model
`ollama`	local	`qwen2.5:14b`
`lmstudio`	local	`default`
`vllm`	local	`default`
`llamacpp`	local	`default`
`groq`	fast cloud	`llama-3.3-70b-versatile`
`cerebras`	fast cloud	`llama-3.3-70b`
`sambanova`	fast cloud	`Meta-Llama-3.3-70B-Instruct`
`deepseek`	cloud	`deepseek-chat`
`openai`	cloud	`gpt-4o`
`gemini`	cloud	`gemini-2.5-flash`
`kimi`	cloud	`kimi-k2`
`openrouter`	cloud	`meta-llama/llama-3.3-70b-instruct`

Configuration

Config load order:

config/default.yaml
~/.protein_design_hub/config.yaml
environment variables

Typical LLM block:

llm:
  provider: "ollama"
  model: "qwen2.5:14b"
  temperature: 0.2
  max_tokens: 4096
  num_rounds: 1

Output Structure

Pipeline job output:

outputs/<job_id>/
  metadata.json
  prediction_summary.json
  input/
    sequences.fasta
  <predictor_name>/
    ...structures...
    scores.json
    status.json
  evaluation/
    comparison_summary.json
    <predictor_name>_metrics.json
  meetings/
    *.json
    *.md
  report/
    report.html
    agent_summaries.json
    step_verdicts.json
    policy_log.json

Mutagenesis job output (UI):

outputs/<scan_job_id>/
  prediction_summary.json
  scan_results.json
  base_wt.pdb
  meetings/
    *.json
    *.md

Development

pip install -e ".[dev]"
pytest -q

Project Structure

src/protein_design_hub/
  agents/       # Orchestrator, scientist personas, meeting engine
  analysis/     # Mutation scanning and analysis workflows
  cli/          # Typer CLI commands
  core/         # Configuration, shared types
  evaluation/   # Metrics and composite evaluator
  pipeline/     # Predictor execution runners
  predictors/   # Predictor adapters/installers
  web/          # Streamlit app and pages

License

MIT License

Citation

If you use this project in research, cite the primary underlying tools (for example: AlphaFold2, ColabFold, Chai-1, Boltz, OpenStructure, ESM/ESMFold/ESM3, and other predictor/evaluator backends used in your run).

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
.github/workflows		.github/workflows
.planning		.planning
.streamlit		.streamlit
config		config
docs		docs
examples		examples
notebooks/teaching/BIOS6380		notebooks/teaching/BIOS6380
pipelines		pipelines
scripts		scripts
src/protein_design_hub		src/protein_design_hub
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
HOW_TO_USE_SCANNER.md		HOW_TO_USE_SCANNER.md
ISSUES_SUMMARY.md		ISSUES_SUMMARY.md
LICENSE		LICENSE
PROJECT_REVIEW.md		PROJECT_REVIEW.md
QUICK_ACTIONS.md		QUICK_ACTIONS.md
README.md		README.md
README_REVIEW.md		README_REVIEW.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
environment.yaml		environment.yaml
molprobity.out		molprobity.out
molprobity_coot.py		molprobity_coot.py
molprobity_probe.txt		molprobity_probe.txt
pyproject.toml		pyproject.toml
skills-lock.json		skills-lock.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Protein Design Hub

Core Capabilities

Pipeline Modes

Step-only Pipeline (5 steps)

LLM-guided Pipeline (12 steps)

Scientist Agents

Team Presets

Installation

Prerequisites

Quick Setup

LLM Backend (Default: Qwen on Ollama)

Quick Start

Web UI Modules

Mutagenesis Workflow

CLI Reference

Python API

LLM Providers

Configuration

Output Structure

Development

Project Structure

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Protein Design Hub

Core Capabilities

Pipeline Modes

Step-only Pipeline (5 steps)

LLM-guided Pipeline (12 steps)

Scientist Agents

Team Presets

Installation

Prerequisites

Quick Setup

LLM Backend (Default: Qwen on Ollama)

Quick Start

Web UI Modules

Mutagenesis Workflow

CLI Reference

Python API

LLM Providers

Configuration

Output Structure

Development

Project Structure

License

Citation

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages