- [2026-04-01] 🎉 New released VenusFactory2 website at venusfactory.cn/playground
- [2026-03-27] 🚀 VenusFactory2 technical report released at arXiv:2603.27303
- [2026-01-23] 🚀 Added VenusX (ICLR2026) in VenusFactory2
- [2025-04-19] 🎉 VenusREM (ISMB/ECCB2025) #1 in ProteinGym & VenusMutHub!
VenusFactory2 is an Agent-driven protein engineering platform combining 40+ AI models with 11 biological databases. Designed for everyone — from biologists to AI researchers.
| 🤖 Agent-First | 🎯 Three Interfaces | ⚡ Zero to Results |
|---|---|---|
| Natural language → Multi-step automation | Web UI / REST API / CLI | Upload → Predict in seconds |
| 40+ models + 11 databases | Same power, different styles | Or train custom models in minutes |
📖 Easy to Learn: Designed for life science professionals with no programming background required. Intuitive Web UI, comprehensive bilingual documentation, rich examples and video tutorials help you quickly grow from beginner to protein AI expert.
| Task | Solution | Time |
|---|---|---|
| 🧬 Mutation effects | ESM-2, ProSST, ProtSSN (zero-shot) | <1 min |
| 🎯 Protein function | 30+ fine-tuned models | <30 sec |
| 🔬 Custom training | 7 PEFT methods (LoRA, QLoRA, etc.) | 10-60 min |
| 💾 Data download | AlphaFold, UniProt, RCSB, KEGG, etc. | Real-time |
| 📚 Literature | AI-powered search & analysis | <2 min |
git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install -r requirements.txt # Detailed guide below ↓cd frontend
npm install
npm run build
cd ..# Web UI v1 (legacy Gradio, local mode)
python src/webui.py --mode all # → http://localhost:7860
# Web UI v2 (FastAPI + React, local mode)
python src/webui_v2.py --host 0.0.0.0 --port 7861 # → http://localhost:7861
# Web UI v2 (FastAPI + React, online mode)
python src/webui_v2.py --host 0.0.0.0 --port 7861 --online
# REST API only
python src/api_server.py # → http://localhost:5000/docs
# CLI
bash script/train/train_plm_lora.sh🤖 Try Agent-0.1 | ⚡ Quick Tools | 🔬 Train Models (Click to expand examples)
Agent-0.1 (Natural Language)
Q: "Predict stability for sequence MKTAYIAKQRQISFV..."
→ Agent auto-selects model → Runs prediction → Returns results + explanations
Quick Mutation Scoring
Upload: PDB/FASTA → Mutations: A23V, K45R → Get: Stability scores
Train Your Model
Model: ESM2-650M → Dataset: DeepSol → Method: LoRA → 15 min → Trained model ✓
Agent-0.1 orchestrates all tools via natural language. Powered by LangGraph + LangChain.
You: "Design thermostable mutations for PDB:1ABC"
↓
🤖 Agent Planning
↓
📥 Download → 🧬 Predict → 🎯 Score → 📊 Report
RCSB PDB ESM-2 scan Stability Ranked list
✨ Agent Capabilities
| Category | Features |
|---|---|
| 🔬 Analysis | Mutation prediction • Function/stability scoring • Structure analysis |
| 💾 Data | Multi-database search • Format conversion • Batch processing |
| 🧠 Planning | Multi-step automation • Tool orchestration • Error handling |
| 📚 Research | Literature mining • Family analysis • Report generation |
💬 Example Conversations
Mutation Design:
You: "Improve thermostability of MKTAYIAKQR..."
Agent: ✓ ESM-2 scanning... ✓ Stability scoring...
→ Top 3: A5V (+2.8 kcal/mol), K9R (+1.9), T2S (+1.5)
Database Search:
You: "Find lysozyme structures <2.0Å resolution"
Agent: ✓ Searching RCSB... → Found 47 structures
→ Downloaded to: temp_outputs/lysozyme_structures/
💡 Note: Requires API key (OpenAI/Anthropic). Currently in Beta.
🌐 Interfaces: Web UI | REST API | CLI
↓
🤖 Agent Layer (LangGraph + LangChain)
↓
🔧 Application: Train | Eval | Predict | Tools
↓
🛠️ Core Tools: 9 categories (mutation, database, search, etc.)
↓
📊 Resources: 40+ Models | 30+ Datasets | 11+ Databases
📚 Integrated Resources
Models (40+): ESM, ProtBert, ProtT5, Venus/PETA/ProSST series
Databases (11+): AlphaFold • RCSB PDB • UniProt • NCBI • KEGG • STRING • BRENDA • ChEMBL • HPA • FDA • Foldseek
Datasets (30+): Function • Localization • Stability • Solubility • Mutation fitness
🔧 Tool Categories
| Tool | Description | Agent | CLI |
|---|---|---|---|
| 🧬 Mutation | ESM-1v, ESM-2, ProSST, ProtSSN, MIF-ST | ✅ | ✅ |
| 🎯 Prediction | 30+ fine-tuned models | ✅ | ✅ |
| 💾 Database | 11 integrations | ✅ | ✅ |
| 🔍 Search | PubMed, FDA, patents | ✅ | ✅ |
| 🏋️ Training | LoRA, QLoRA, DoRA, etc. | ✅ | ✅ |
| 📁 File | Format conversion | ✅ | ✅ |
| 🔬 Denovo | Protein design | ✅ | ✅ |
| 🧪 Discovery | Novel discovery | ✅ | ✅ |
| 📊 Visualize | 3D viewer | ✅ | ✅ |
40+ Protein Language Models (Click to expand)
Venus Series (Liang's Lab): ProSST-20/128/512/1024/2048/4096 (110M) • ProPrime-690M • VenusPLM-300M • PETA-base/bpe/unigram (80M)
ESM Series (Meta AI): ESM2: 8M, 35M, 150M, 650M, 3B, 15B • ESM-1v: 5 models (650M each)
ProtBert & ProtT5: ProtBert-Uniref100/BFD (420M) • IgBert (420M) • ProtT5-XL/XXL (3B-11B) • Ankh-base/large (450M-1.2B)
Selection Guide:
- GPU <8GB: ESM2-8M/35M, ProSST
- GPU 8-16GB: ESM2-150M/650M, ProtBert
- GPU 24GB+: ESM2-3B, ProtT5-XL
- Multi-GPU: ESM2-15B, ProtT5-XXL
By Task:
- Classification: ESM2, ProtBert
- Structure: Ankh
- Generation: ProtT5
- Antibody: IgBert/IgT5
- Lightweight: ProSST, PETA
30+ Supervised + Zero-Shot Datasets
Zero-Shot: VenusMutHub • ProteinGym (217 DMS)
Function: EC • GO_BP • GO_CC • GO_MF Localization: DeepLocBinary • DeepLocMulti • DeepLoc2Multi Stability: Thermostability • TAPE_Stability Solubility: DeepSol • DeepSoluE • eSOL • ProtSolM • PETA_CHS/LGK/TEM_Sol Mutation: FLIP_AAV (7 splits) • FLIP_GB1 (5 splits) • TAPE_Fluorescence Others: DeepET_Topt • MetalIonBinding • SortingSignal • PaCRISPR
All datasets available on HuggingFace
System packages (Linux): PDF export pulls
pycairo, which builds againstcairo. On Debian/Ubuntu, install once before pip-installing:sudo apt-get install -y libcairo2-dev libxml2-dev pkg-configOr, with conda, install the pre-built binary to skip the build:
conda install -c conda-forge pycairomacOS / Windows: pre-built wheels exist, no extra system step needed.
🍎 macOS (M1/M2/M3)
git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
pip install torch_scatter torch-sparse torch-geometric -f https://data.pyg.org/whl/torch-2.8.0+cpu.html
pip install -r requirements_for_macOS.txt🪟 Windows / 🐧 Linux (CUDA 12.8)
git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
# Linux only: see "System packages" note above for cairo headers.
pip install torch==2.8.0 torchvision --index-url https://download.pytorch.org/whl/cu128
pip install torch_geometric pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv \
-f https://data.pyg.org/whl/torch-2.8.0+cu128.html
pip install -r requirements.txt🪟 Windows / 🐧 Linux (CUDA 11.8)
git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install torch==2.7.0 --index-url https://download.pytorch.org/whl/cu118
pip install torch_geometric pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv \
-f https://data.pyg.org/whl/torch-2.7.0+cu118.html
pip install -r requirements.txt💻 CPU Only
git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install torch_geometric pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv \
-f https://data.pyg.org/whl/torch-2.8.0+cpu.html
pip install -r requirements.txtA helper script installs torch / PyG / pyproject.toml deps into .venv/ via uv:
git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
# Linux only: see "System packages" note above for cairo headers.
python install.py --type cu128 # or: --type cpu
source .venv/bin/activateRun the bundled readiness check:
python scripts/check_env.pyIt validates torch/CUDA, PyG, transformers, the agent stack, project src imports, and runs a CUDA matmul smoke test. Exit code is non-zero if any required dependency is missing.
WebUI v2 serves static files from
frontend/distin production mode, so runnpm run buildinfrontend/before startingsrc/webui_v2.py.
# Build WebUI v2 frontend assets first
cd frontend && npm run build && cd ..
# v1 (legacy Gradio) - local mode
python src/webui.py --mode all # → http://localhost:7860
# v1 (legacy Gradio) - online-compatible mode (feature-limited)
WEBUI_V2_MODE=online python src/webui.py --mode all # → http://localhost:7860
# v2 (FastAPI + React) - local mode
python src/webui_v2.py --host 0.0.0.0 --port 7861 # → http://localhost:7861
# v2 (FastAPI + React) - online mode
python src/webui_v2.py --host 0.0.0.0 --port 7861 --online # → http://localhost:7861- Main runtime configuration template:
.env.example - Typical flow:
cp .env.example .envthen adjust required keys for your mode. - Minimal local setup: keep defaults, set
OPENAI_API_KEYonly if you use LLM-backed features. - Minimal online setup: set
WEBUI_V2_MODE=online,WEBUI_V2_SESSION_TOKEN_SECRET, and review allWEBUI_V2_*_LIMITvalues.
| Tab | Purpose | Features |
|---|---|---|
| Training | Train custom models | Model selection • PEFT methods • Real-time monitoring • Wandb |
| Evaluation | Benchmark testing | Load model • Select metrics • CSV export |
| Prediction | Inference | Single/batch prediction • Result visualization |
| Agent | Natural language | Multi-step automation • Tool orchestration |
| Quick Tools | Rapid prediction | Mutation scoring • Function prediction |
| Advanced | Deep analysis | Sequence/structure-based models |
| Download | Data retrieval | AlphaFold • UniProt • RCSB • InterPro |
| Manual | Documentation | Guides & tutorials |
Command Line Examples
# Train model
bash script/train/train_plm_lora.sh \
--model facebook/esm2_t33_650M_UR50D \
--dataset DeepSol --batch_size 32
# Evaluate
bash script/eval/eval.sh \
--model_path ckpt/DeepSol/best_model \
--test_dataset DeepSol
# Download data
bash script/tools/database/alphafold/download_alphafold_structure.sh
bash script/tools/database/uniprot/download_uniprot_seq.sh
# Generate structure sequences
bash script/get_structure_seq/get_esm3_structure_seq.shREST API Examples
# Start server
python src/api_server.py # → http://localhost:5000/docs
# Mutation prediction
curl -X POST http://localhost:5000/api/mutation/predict \
-H "Content-Type: application/json" \
-d '{"sequence": "MKTAYIA...", "mutations": ["A23V", "K45R"]}'
# Function prediction
curl -X POST http://localhost:5000/api/predict/function \
-H "Content-Type: application/json" \
-d '{"sequence": "MKTAYIA...", "tasks": ["solubility", "stability"]}'
# Database search
curl http://localhost:5000/api/database/uniprot/search?query=lysozyme&limit=10Python API
from src.tools.mutation import predict_mutation_effects
from src.tools.predict import predict_protein_function
from src.tools.database import download_alphafold_structure
# Mutations
results = predict_mutation_effects(
sequence="MKTAYIAKQR...",
mutations=["A5V", "K9R"],
model="esm2"
)
# Function
predictions = predict_protein_function(
sequence="MKTAYIA...",
tasks=["solubility", "stability"]
)
# Data
pdb_file = download_alphafold_structure("P12345")| Method | Memory | Speed | Performance | Best For |
|---|---|---|---|---|
| LoRA | Low | Fast | Good | General tasks |
| QLoRA | Very Low | Slow | Good | Limited GPU |
| DoRA | Low | Medium | Better | Improved LoRA |
| AdaLoRA | Low | Medium | Better | Adaptive rank |
| SES-Adapter | Medium | Medium | Better | Selective tuning |
| IA3 | Very Low | Fast | Good | Lightweight |
| Freeze | Low | Fast | Good | Simple tuning |
@article{tan2026venusfactory2,
title={Self-evolving AI agents for protein discovery and directed evolution},
author={Tan, Yang and Zhang, Lingrong and Li, Mingchen and Yu, Yuanxi and Zhong, Bozitao and Zhou, Bingxin and Dong, Nanqing and Hong, Liang},
journal={arXiv preprint arXiv:2603.27303},
year={2026}
}
@inproceedings{tan2025venusfactory,
title={VenusFactory: An Integrated System for Protein Engineering with Data Retrieval and Language Model Fine-Tuning},
author={Tan, Yang and Liu, Chen and Gao, Jingyuan and Banghao, Wu and Li, Mingchen and Wang, Ruilin and Zhang, Lingrong and Yu, Huiqun and Fan, Guisheng and Hong, Liang and others},
booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)},
pages={230--241},
year={2025}
}VenusFactory is released under the VenusFactory Non-Commercial License — see LICENSE (English, authoritative) and LICENSE_CN.md (中文翻译, for reference).
- ✅ Academic / non-commercial use is free for not-for-profit research institutions, government laboratories, universities, and individuals — for internal research, teaching, and personal study. Please cite the references in the Citation section.
- ❌ Commercial use is NOT permitted under the default license. This includes any use by or for a for-profit entity, any fee-bearing product/service/API, contract research whose IP is owned by a for-profit entity, and any use intended for commercial advantage.
- 📧 Commercial licensing requires prior written approval. To request a commercial license — or to confirm whether your intended use qualifies as commercial — please email hongl3lilang@sjtu.edu.cn (Liang Lab, Shanghai Jiao Tong University). Your inquiry should include the licensee entity, intended use, scope, and duration.
Developed by Liang's Lab at Shanghai Jiao Tong University.
Resources: Docs • YouTube • Playground • Issues
Made with ❤️ for the protein engineering community






