Skip to content

2026/06/16 tooluniverse update#7

Merged
taferh merged 9 commits into
squirro:mainfrom
mims-harvard:main
Jun 19, 2026
Merged

2026/06/16 tooluniverse update#7
taferh merged 9 commits into
squirro:mainfrom
mims-harvard:main

Conversation

@taferh

@taferh taferh commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

gasvn added 9 commits June 11, 2026 22:29
…osure (#251)

* Fix unauthenticated RCE in python_code_executor and harden server exposure

Security fix for an unauthenticated remote code execution path reported via
GitHub Security Advisory.

Executor sandbox (python_executor_tool.py):
- Remove the per-call allowed_imports override so a caller can no longer widen
  the server-side import allowlist before the AST safety check runs.
- Remove getattr/setattr from the safe builtins; they enable string-based
  attribute access that bypasses the AST check.
- Block all dunder attribute access, bare dunder names, and dunder string
  literals to stop class-hierarchy traversal escapes
  (().__class__.__bases__[0].__subclasses__() and getattr/subscript variants).
- Block plain-attribute pivots through allowed scientific modules
  (e.g. matplotlib.os, numpy.ctypeslib) that reach os/subprocess/ctypes.

Server authentication (server_security.py, http_api_server*, smcp*):
- Default all server binds to 127.0.0.1.
- Refuse to bind to a non-loopback interface unless TOOLUNIVERSE_API_TOKEN is set.
- Enforce Bearer-token auth on all FastAPI routes (except /health) and on the
  FastMCP HTTP transport via StaticTokenVerifier when a token is configured.

Adds regression tests in tests/unit/test_security_advisory_fixes.py.

* Add module-pivot / FFI sandbox-escape regression tests

Completes the RCE fix in 6f370d3: that commit landed the
DANGEROUS_ATTRIBUTE_NAMES denylist (blocking numpy.ctypeslib.ctypes.CDLL,
matplotlib.os/.subprocess, random._os, collections._sys, enum's bltns->
builtins->getattr pivots) but the dedicated regression tests did not make it
into the commit. Add them: 6 module-pivot escape cases (all must be blocked),
3 legit numeric-attribute cases (numpy.random/trace/linalg must NOT be
false-positived), and a normalization unit test. 25 -> 35 security tests pass.

* Harden remaining server entry points and document API token

Follow-up to the RCE/exposure fix, covering the rest of the network surface:

- tool_graph_web_ui: default bind to 127.0.0.1 and turn off Flask debug
  (debug=True exposes the Werkzeug debugger = arbitrary code execution);
  apply the same bind guard and refuse the debugger on non-loopback hosts.
- ToolUniverseClient: add api_token (defaults to TOOLUNIVERSE_API_TOKEN) so
  clients can authenticate against a token-protected server.
- Document TOOLUNIVERSE_API_TOKEN, the loopback default, and Bearer auth in
  docs/guide/http_api.rst; stop recommending --host 0.0.0.0 without a token in
  module/CLI usage examples and printed hints.

Extends tests/unit/test_security_advisory_fixes.py (client auth header, graph
UI loopback/debugger guards).

* Harden peripheral Flask servers (expert feedback, graph viz)

Same loopback-default + bind-guard posture applied to the remaining servers
that did not serve the code executor but still exposed network interfaces:

- expert_feedback/human_expert_mcp_tools: start_http_api_server and
  start_web_server no longer hardcode host=0.0.0.0. They default to
  127.0.0.1 (override via TOOLUNIVERSE_EXPERT_HOST) and refuse a non-loopback
  bind without TOOLUNIVERSE_API_TOKEN.
- scripts/visualize_tool_graph: apply the bind guard and never enable the
  Flask/Werkzeug debugger (debug/reloader) on a non-loopback host, since the
  interactive debugger allows arbitrary code execution.

* Default registered MCP tool configs to loopback (ESM, expert feedback)

These @register_mcp_tool configs hardcoded host=0.0.0.0. With the new SMCP
bind guard they would fail-closed for local users (non-loopback bind without a
token). Default them to 127.0.0.1 so local runs work; remote exposure still
requires TOOLUNIVERSE_API_TOKEN via the guard at server start.

* Add optional Bearer auth to standalone remote MCP servers

The pinnacle, depmap_24q2, immune_compass, and uspto_downloader servers are
standalone FastMCP deployments (own requirements.txt, no tooluniverse import)
meant to be network-hosted, so they keep binding 0.0.0.0. They now enable
StaticTokenVerifier Bearer auth when TOOLUNIVERSE_API_TOKEN is set, and are
completely unchanged (auth=None) when it is not — fully backward compatible.
Self-contained snippet, no dependency on the tooluniverse package.
…eyless resource tools + product-safety skill (#248)
…sPro tools (#254)

Add a general 'peptide -> any protein target' deorphanization capability: given a
peptide sequence (and optionally a phenotype and a hypothesized target it does NOT
actually bind), find its likely real target(s).

Skill (tooluniverse-peptide-target-deorphanization, both skill trees + router):
- Keyless multi-route pipeline: characterization/motif (PROSITE/ELM), BLAST
  homology, target-class router (GPCR / ion channel / protease / cytokine receptor
  / integrin / ...), target-family enumeration (HGNC + InterPro general, GPCRdb for
  GPCRs), OpenTargets phenotype anchor, EnsemblCompara/Alliance cross-species
  reconciliation, ranked shortlist with evidence tiers.
- Two runnable scripts: deorphanize_peptide.py (keyless Phases 1-4, seeded/seedless/
  batch, cross-species interface alignment, ClusPro-ready PDB) and cofold_screen.py
  (optional NVIDIA-NIM co-folding). references/phases.md manual + evals/evals.json.

Tools:
- ProteomicsDB_get_protein_meltome (keyless TPP/meltome melting curves for CETSA/TPP
  target-engagement; soluble proteome).
- ClusPro_submit_peptide_docking (academic-free peptide-protein docking; key-gated).

Live-validated on the exendin-4 -> GLP1R control and on non-GPCR targets (CACNA1B
ion channel, EGFR RTK, MMP9/CTSK proteases, IL6R cytokine receptor). 34 unit tests.

Note: optional cross-checks PepCalc/AMPSphere are referenced by the skill docs and
degrade gracefully (ProtParam covers properties; AMPSphere is a doc-only screen).
Adds standalone query tools for 11 keyless peptide databases (no API key), built
and live-verified this session. These complete the optional cross-checks the
peptide target-deorphanization skill references (PepCalc, AMPSphere, Norine):

- PepCalc_peptide_properties — physicochemical properties (pI, MW, formula)
- AMPSphere (6 tools) — global AMP catalogue: sequence match, search, family,
  per-AMP record, geographic/feature distributions
- Norine_get_peptide — non-ribosomal / cyclic peptide lookup
- DBAASP (2) — antimicrobial peptide activity database
- Hemolytik2 / CancerPPD2 / PEPlife2 / TumorHope2 — hemolytic / anticancer /
  half-life / tumor-homing peptide databases
- HLALigandAtlas (2) / MHCMotifAtlas — immunopeptidomics (benign HLA ligands,
  donors, allele motifs)
- PeptideAtlas_get_observed_peptides — MS-observed proteotypic peptides

All keyless; 18/18 load; 78 offline mocked unit tests pass. ConoServer was
intentionally excluded — its bulk-download endpoint returns 403 Forbidden
(server-side block), so it is not reliably usable.
…isions (#256)

generate_tools.py emitted two kinds of invalid/incorrect output when it
regenerated the per-tool wrapper stubs:

1. A nullable JSON type like ["string", "null"] was annotated as
   Optional[str | Any] instead of Optional[str]. prop_to_python_type did not
   drop the "null" member (unlike json_type_to_python), so nullability — already
   conveyed by the Optional[...] wrapper — leaked a noisy | Any.

2. A tool parameter whose name matched an injected keyword-only argument
   (use_cache, validate, stream_callback) produced a duplicate argument, e.g.
   'SyntaxError: duplicate argument use_cache', breaking import of the entire
   tooluniverse.tools package (hit by ESM_describe_sae_feature).
   sanitize_param_name now suffixes such names (use_cache -> use_cache_),
   mirroring its existing Python-keyword handling.

Adds unit tests for both and restores the previously-failing
test_coding_api_integration::test_sdk_import_integration to green.
* Add Codex plugin packaging

* Wire Codex plugin into skill-sync + release pipeline; bring to parity

The Codex plugin packaging lived outside the Claude plugin's release flow, so its
bundled skills and version drifted (134 skills @ 1.2.4 vs Claude's 136 @ 1.2.18).
This puts both packagings on one pipeline:

- plugin/sync-skills.sh now rebuilds BOTH plugin/skills/ (Claude) and
  plugins/tooluniverse/skills/ (Codex) from the canonical skills/ tree; Codex
  drops Claude-Code-specific skills (tooluniverse-claude-code-plugin).
- scripts/release-plugin.sh now also bumps plugins/tooluniverse/.codex-plugin/
  plugin.json on release, so Codex's version tracks Claude's.

One-time parity:
- Codex skills resynced (135 = 136 user-facing minus the Claude-only skill);
  adds tooluniverse-peptide-target-deorphanization, drops stale entries.
- Codex plugin.json 1.2.4 -> 1.2.18 to match the current release.
- Restored skills/tooluniverse-custom-tool/references/json-tool.md: PR #194 edited
  only the Claude plugin copy, leaving the source missing the API-key declaration
  docs, which any resync would otherwise drop.

Codex auto-upgrades owner/repo Git marketplaces on startup, so this reaches Codex
users on next launch; Claude users get it via the version bump.

* Fix Codex skill sync: delegate to the Codex-specific normalizer

The prior pipeline commit rebuilt plugins/tooluniverse/skills/ with the same
filter as the Claude copy, which left in the disable-model-invocation marker
(rejected by Codex validation; 133 skills) and descriptions over Codex's
1024-char limit (4 skills). Have plugin/sync-skills.sh delegate the Codex copy to
scripts/sync-codex-plugin-skills.sh — which strips the marker and compacts long
descriptions — instead of duplicating and diverging from that logic. Also add an
evals/ exclude to the Codex sync so dev eval scaffolding does not ship, matching
the Claude filter.

* Run the Codex plugin packaging test in CI

tests/test_codex_plugin.py lived at the tests/ top level, outside pytest.ini's
testpaths (tests/unit, tests/integration, tests/tools), so CI never collected it
— its per-skill assertions (description <= 1024 chars, disable-model-invocation
not set) silently never ran, which is why the earlier broken Codex skill regen
passed CI. Move it under tests/unit/, fix the repo-root parents[] depth, and mark
it unit so the Codex skill set is validated on every run.

* Rename Codex marketplace from tooluniverse-local to tooluniverse

The catalog name/displayName carried a 'local' suffix left over from local-dir
testing, but this is the published GitHub marketplace users add via
`codex plugin marketplace add mims-harvard/ToolUniverse`. Drop the misleading
'local' so it lists as 'tooluniverse' / 'ToolUniverse'.

* Update Codex plugin README: add install steps, point sync at the shared script

- Add an Install section (codex plugin marketplace add mims-harvard/ToolUniverse
  + codex plugin add tooluniverse) — the README had no end-user install steps.
- Point the Development sync at plugin/sync-skills.sh, the unified entry point
  that rebuilds BOTH the Claude and Codex skill copies; the old instruction ran
  only the Codex-specific sync, so following it left the Claude copy stale.

* Version plugins as <package MAJOR.MINOR>.<plugin revision>

Tie both plugin versions to the PyPI package's MAJOR.MINOR while letting them
carry an independent PATCH (the plugin revision). release-plugin.sh now derives
the next version from pyproject.toml's MAJOR.MINOR and bumps the plugin PATCH
(resetting to .0 when the package's MAJOR.MINOR advances), instead of bumping the
plugin's own semver freely. This keeps Claude + Codex plugin versions coherent
with the package they serve while a skill-only change ships as a PATCH bump —
without forcing an empty PyPI package release.

* Update RELEASING.md for the package-tracked plugin version scheme

Document the <package MAJOR.MINOR>.<plugin revision> scheme, the 4 bumped plugin
manifests (Codex included), and that both Claude (version-driven) and Codex
(git-driven) plugins auto-update — replacing the stale level-based / 3-manifest /
Claude-only instructions.

* Add tooluniverse-codex-plugin install skill (Codex-only)

A one-step install + troubleshooting guide for the ToolUniverse plugin on OpenAI
Codex, mirroring tooluniverse-claude-code-plugin but adapted to Codex's commands
(codex plugin marketplace add / plugin add -m), startup auto-upgrade behavior,
and MCP-via-uvx model — and noting Codex ships MCP tools + skills only (no slash
commands). Symmetric to the Claude install skill: plugin/sync-skills.sh now
excludes it from the Claude packaging, while the Codex sync includes it (and
drops the Claude-only tooluniverse-claude-code-plugin). Description tuned via the
skill-creator trigger optimizer (perfect precision; install-skill recall is
inherently low and largely platform-bound).
Bump the PyPI package from 1.2.6 to 1.3.0 to ship everything merged to main since
v1.2.6 (2026-06-06). The package had not been released, so uvx/pip users were
still on the 1.2.6 tool set and missing the security fix.

Bumps every package-version reference to stay in lockstep (the mcpb-bundle guard
test enforces this):
- pyproject.toml (root package)
- mcpb/pyproject.toml + mcpb/manifest.json (native MCPB bundle)
- server.json (MCP registry: top-level + packages[0])
__version__ tracks the root version via importlib.metadata.

Included since 1.2.6:
- +302 tools (2176 -> 2478) across 36 new databases — PheWAS/biobanks, MSA &
  phylogeny, clinical risk calculators, peptide-resource databases (#248, #254, #255)
- Unauthenticated RCE fix + server-exposure hardening in python_code_executor (#251)
- Coding-API stub generator fix: nullable types + injected-param collisions (#256)

Merging to main triggers publish-pypi.yml to publish 1.3.0 automatically.
… + router wiring (#258)

* Add cross-biobank PheWAS coverage (BioBank Japan, UKB-TOPMed, TPMI, Genebass) + skill

Discovery round: ToolUniverse covered only FinnGen for phenome-wide association
(PheWAS). This adds the missing cross-ancestry PheWAS resources plus a
replication skill.

New tools (all public, no auth):
- BioBankJapan_phewas_by_variant  (pheweb.jp, Japanese, GRCh37)
- UKBTOPMed_phewas_by_variant     (pheweb.org/UKB-TOPMed, European, GRCh38)
- TPMI_phewas_by_variant          (sinica TPMI, Taiwanese, GRCh38)
- Genebass_gene_burden_phewas     (main.genebass.org, UKB exome gene burden)

Implementation notes:
- Shared PheWebPheWASTool for the three PheWeb biobanks (config-selected).
- rsID -> coordinates via Ensembl in each biobank's build; multi-allelic SNPs
  try every alt and select the allele actually present (PheWeb returns HTTP 200
  with a null body for absent variants, so selection is by presence of phenos).
- Genebass joins phenotype descriptions from /phenotypes (cached).
- TPMI serves a cert that fails OpenSSL 3.x strict verification; a per-host
  adapter clears only VERIFY_X509_STRICT while keeping full CA-chain + hostname
  verification (untrusted/self-signed certs still rejected; not verify=False).

New skill tooluniverse-phewas orchestrates the four tools + FinnGen for
cross-ancestry replication, with allele-frequency/power interpretation guidance.

Validated live: rs7903146/TCF7L2 -> T2D replicated across all ancestries;
PCSK9 pLoF -> LDL/cholesterol; rs671/ALDH2 -> alcohol disorders in East Asian
biobanks but monomorphic in UKB-TOPMed. 12 unit tests, ruff clean.

Both registration points updated (default_config + _lazy_registry_static).

* Add de novo MSA + phylogeny tools (EMBL-EBI Job Dispatcher)

Discovery gap: ToolUniverse could fetch pre-computed alignments/trees (Rfam,
Ensembl Compara) but had no way to align a user's own sequences or build a tree
from them. The phylogenetics skill explicitly disclaimed MSA generation.

New tools (public, no auth; EMBL-EBI Job Dispatcher submit -> poll -> result):
- EBI_msa_align               MSA via Clustal Omega (default) / MUSCLE / MAFFT /
                              Kalign / T-Coffee; returns aligned FASTA, Clustal,
                              and guide tree
- EBI_build_phylogenetic_tree neighbour-joining / UPGMA tree (Newick) from an
                              alignment, optional Kimura correction + gap trim

Notes:
- Per-method param mapping (Clustal Omega uses outfmt+stype; MUSCLE uses format
  and auto-detects type; MAFFT/Kalign use format+stype).
- Dynamic result-type selection: only fetch result types the job produced.
- 30s per-request timeout; polls up to ~2.5 min.

Updated the tooluniverse-phylogenetics skill to chain these for de novo
alignment + NJ/UPGMA trees (still defers ML/Bayesian to dedicated tooling).

Validated live: cytochrome C human/horse/yeast -> human+horse closest, yeast
outgroup; globin fragments align with correct guide tree. 11 unit tests, ruff
clean. Both registration points updated.

* Add validated clinical risk calculators (compute, no API/key)

Discovery gap: ToolUniverse had data-retrieval tools across many domains but no
bedside clinical risk scores. These are pure-compute, deterministic, fully
testable (no network, no key) implementations of standard validated formulas,
each returning the score, a risk interpretation, and an auditable component
breakdown.

Calculators (ClinicalCalculatorTool, selected by fields.calculator):
- ClinicalCalc_CHA2DS2_VASc   AF stroke risk (Lip 2010)
- ClinicalCalc_HAS_BLED       AF bleeding risk (Pisters 2010)
- ClinicalCalc_CURB_65        pneumonia severity (Lim 2003)
- ClinicalCalc_qSOFA          bedside sepsis risk (Singer 2016)
- ClinicalCalc_Child_Pugh     cirrhosis severity / class A-C (Pugh 1973)
- ClinicalCalc_Wells_DVT      DVT pretest probability (Wells 2003)
- ClinicalCalc_Wells_PE       PE pretest probability (Wells 2000)
- ClinicalCalc_MELD_Na        liver mortality / transplant priority (OPTN 2016)
- ClinicalCalc_eGFR_CKD_EPI   eGFR, CKD-EPI 2021 race-free (Inker 2021)
- ClinicalCalc_ASCVD_risk     10-yr ASCVD risk, Pooled Cohort Eqns (Goff 2013)

Validated against published reference cases: ASCVD White-male-55 = 5.4%,
White-female = 2.1%, African-American-male = 6.1% (official ACC/AHA examples);
CHA2DS2-VASc, Child-Pugh class, MELD-Na, CKD-EPI all match. 17 unit tests.

No orchestration skill: each calculator is a standalone deterministic call with
built-in interpretation, so a wrapper skill would add little. Both registration
points updated.

* Harden discovery-batch tools: Child-Pugh input validation, PheWAS per-candidate resilience, skill doc fixes

- ClinicalCalc_Child_Pugh: reject unrecognized ascites/encephalopathy values
  instead of silently scoring the best case (1 pt); add 'absent'/'slight'
  synonyms and an enum on the JSON schema so bad input is caught at validation.
- PheWebPheWAS: a transient 5xx on one multi-allelic candidate no longer aborts
  the whole lookup; only surface an error when every candidate fails to fetch
  (distinguishes 'variant absent' from 'all fetches failed').
- Genebass: clarify the phenotype-description join-key docstring.
- tooluniverse-phewas skill: document that FinnGen_get_variant_finemapping needs
  an explicit GRCh38 chr:pos:ref:alt string (not an rsID).
- tooluniverse-phylogenetics skill: correct phykit saturation column semantics
  and list the actual phykit_batch_analysis 'function' enum values.
- Add unit tests for all new behavior (34 pass).
- Bump plugin to 1.2.7 for the skill content changes.

* Discovery: +10 keyless tools (openFDA safety, FAVOR, OPSIN, NCBI genomes) + product-safety skill

Net-new tools from a 5-frontier discovery survey (all public, no API key, live-verified):

openFDA expansion (5, BaseRESTTool config — reuses existing query machinery):
- OpenFDA_search_drug_shortages (current/resolved U.S. drug shortages, added 2025)
- OpenFDA_search_device_adverse_events (MAUDE), OpenFDA_search_device_recalls
- OpenFDA_search_food_adverse_events (CAERS food/supplement/cosmetic)
- OpenFDA_search_animalvet_adverse_events (veterinary)

FAVOR (1): FAVOR_annotate_variant — comprehensive GRCh38 single-variant functional
annotation (BRAVO/gnomAD/1000G frequencies, GENCODE consequence, CADD/SIFT/PolyPhen/
AlphaMissense/MetaSVM scores, GERP/phyloP conservation, ClinVar, regulatory) in one call.

OPSIN (1): OPSIN_name_to_structure — deterministic IUPAC name -> SMILES/InChI/InChIKey
parser (EBI, MIT); resolves systematic names absent from compound databases.

NCBI Datasets v2 genome module (3, extends existing NCBIDatasetsTool):
- NCBIDatasets_get_genome_assembly, NCBIDatasets_list_genomes_by_taxon,
  NCBIDatasets_get_sequence_reports (assembly metadata/discovery/sequence mapping).

Skill (1): tooluniverse-product-safety-surveillance — multi-product post-market safety
(devices/food/supplements/cosmetics/veterinary/shortages), orchestrating the new openFDA
tools. Explicit negative boundary vs the drug-AE signal skills (pharmacovigilance).

- 13 new unit tests (47 pass across changed tools). All tools live-verified via CLI.
- Bump plugin 1.2.7 -> 1.2.8 for the new skill content.

* Discovery (batch 2): +6 keyless tools — Ensembl Tark, MARRVEL, EU CTIS

More clean, public, no-API-key resource gaps from the discovery survey:

- Ensembl Tark (2): Tark_get_mane_transcripts (MANE Select/Plus Clinical ENST<->RefSeq
  NM mapping by gene/transcript id) and Tark_get_transcript (archived transcript record
  with versions/checksums/releases). Fills the MANE/transcript-equivalence gap.
- MARRVEL (2): MARRVEL_get_gene (aggregated OMIM/HGNC/Ensembl/Entrez/UniProt identity)
  and MARRVEL_get_omim_phenotypes (OMIM disease associations + inheritance) for
  rare-disease/Mendelian triage.
- EU CTIS (2): CTIS_search_trials and CTIS_get_trial — the EU clinical-trials register
  (Clinical Trials Regulation, since 2022), complementing ClinicalTrials.gov.

All 18 test_examples live-verified (real retrievable IDs); 13 new unit tests (60 pass
across changed tools). Tools-only — no plugin/skill content change, no version bump.

* Discovery (batch 3): +1 ClassyFire tool + cross-feed new tools into skills

Tool:
- ClassyFire_classify_by_inchikey — ChemOnt chemical taxonomy (kingdom->superclass->
  class->subclass) by InChIKey; fills the chemical-classification gap.

Skill gap fixes (wire the new resource tools into the skills that should use them):
- tooluniverse-clinical-trial-matching: add EU CTIS (CTIS_search_trials / CTIS_get_trial)
  for European/EEA trials — previously ClinicalTrials.gov only; description updated to
  trigger on EU-trial queries.
- tooluniverse-rare-disease-diagnosis: wire MARRVEL_get_gene + MARRVEL_get_omim_phenotypes
  into Phase 3 (gene panel; inheritance-aware filtering) and FAVOR_annotate_variant into
  Phase 4 (one-call variant annotation), plus fallback-chain entries.

All cross-fed tool calls live-verified; +3 unit tests (63 pass). Plugin 1.2.8 -> 1.2.9
(skill content changed).

* Discovery (batch 4): +4 keyless tools — NPAtlas, ISRCTN + ISRCTN trial-matching cross-feed

- NPAtlas (2): NPAtlas_search_compounds (microbial natural products by name/InChIKey/
  formula/SMILES via basicSearch) and NPAtlas_get_compound (full record by NPAID incl.
  source organism + originating reference). Fills the microbial-natural-products gap.
- ISRCTN (2): ISRCTN_search_trials and ISRCTN_get_trial — the ISRCTN registry (WHO-primary,
  UK/international), parsing the XML query API into structured records. Third trial source.
- clinical-trial-matching skill: ISRCTN added alongside ClinicalTrials.gov + EU CTIS;
  description now triggers on US/EU/UK trial queries; merge-and-dedupe guidance via cross-ref ids.

Fixed during build: ISRCTN pagination param is 'limit' not 'pageSize' (verified live).
All 12 test_examples live-verified (real retrievable IDs); +9 unit tests (38 pass).
Plugin 1.2.9 -> 1.2.10 (skill content changed).

* Discovery (batch 5): +2 new skills orchestrating this session's tools

The tools added this session were referenced by zero skills — these two close that gap:

- tooluniverse-microbial-genome-characterization: genome-assembly discovery/QC/replicon
  mapping for any organism via the NCBI Datasets genome tools (list_genomes_by_taxon,
  get_genome_assembly, get_sequence_reports + taxonomy). Negative boundary vs
  comparative-genomics (gene orthology) and plant-genomics.
- tooluniverse-natural-product-dereplication: NP dereplication + chemotaxonomy via
  NPAtlas (producing organism + reference) + ClassyFire (ChemOnt class) + OPSIN +
  PubChem. Negative boundary vs chemical-compound-retrieval and metabolomics skills.

Both built with skill-creator; every documented tool call live-verified; evals/evals.json
with 3 verifiable prompts each. Plugin 1.2.10 -> 1.2.11.

* Discovery (batch 6): +4 keyless tools — EPA Envirofacts + USDA PLANTS (new frontiers)

Two entirely new coverage frontiers (TU had nothing for either):
- EPA Envirofacts (2): EPA_search_tri_facilities (Toxics Release Inventory) and
  EPA_search_frs_facilities (Facility Registry Service) — US regulated-facility lookup.
- USDA PLANTS (2): USDA_plants_get_profile (taxonomy/habit/native status) and
  USDA_plants_get_characteristics (morphology/physiology/growth traits) by PLANTS symbol.

All 12 test_examples live-verified; +7 unit tests (45 pass). Tools-only, no version bump.

* Discovery cross-feeds: wire this session's tools into 9 existing skills

The tools added this session were under-referenced by existing skills. Additive doc edits
(no content removed), every newly-referenced tool call live-verified:

- FAVOR_annotate_variant -> variant-functional-annotation, regulatory-variant-analysis,
  variant-to-mechanism (one-call GRCh38 comprehensive annotation fast-pass + fallback).
- Tark_get_mane_transcripts / Tark_get_transcript -> variant-interpretation,
  acmg-variant-classification (lightweight MANE/transcript-namespace cross-check).
- OPSIN_name_to_structure -> chemical-compound-retrieval, organic-chemistry
  (systematic IUPAC name -> structure; trade names fall back to PubChem).
- NCBIDatasets genome tools -> infectious-disease, comparative-genomics (assembly QC
  context; pointers to the new microbial-genome-characterization skill).

Plugin 1.2.11 -> 1.2.12.

* Discovery (batch 8): +3 keyless tools — Allen Cell Types + iDigBio

- Allen Cell Types (1): AllenCellTypes_search_specimens — single-neuron electrophysiology/
  morphology specimens (human/mouse) via the Brain-Map RMA API; filter by species/structure.
  Distinct from the existing gene-expression AllenBrain tools.
- iDigBio (2): iDigBio_search_records (130M+ biodiversity specimen records by taxon/locality)
  and iDigBio_get_record (full Darwin Core record by UUID).

All 9 test_examples live-verified (real UUIDs); +7 unit tests (52 pass). Tools-only, no version bump.

* Discovery (batch 9): +2 Pathoplexus/LAPIS tools — pathogen genomic surveillance

Found by a double-check coverage sweep (a genuine gap missed in the first two survey rounds):

- Pathoplexus_count_sequences: aggregated open pathogen genome-sequence counts by
  organism, filterable by country/lineage and groupable by metadata field.
- Pathoplexus_get_mutations: characteristic amino-acid/nucleotide mutations above a
  proportion threshold, for mutation-prevalence surveillance.

LAPIS-backed (lapis.pathoplexus.org); organisms west-nile/ebola-zaire/ebola-sudan/cchf/
mpox. Distinct from Nextstrain (phylo builds) and BV-BRC. Caught + handled a LAPIS quirk:
the /aggregated endpoint rejects limit/offset (unordered output).

All 6 test_examples live-verified; +6 unit tests (58 pass). Tools-only, no version bump.

* Discovery (batches 10-12): Open Genes + FooDB + TogoID tools

Three gaps found by coverage-sweep double-checks (each had zero TU coverage):

- Open Genes (2): OpenGenes_get_gene / OpenGenes_search_genes — curated aging/longevity
  gene DB (mechanisms, evidence counts per study type). Cross-fed into the
  tooluniverse-aging-senescence skill Phase 2.
- FooDB (1): FooDB_get_compound — food chemical-constituent by FooDB id (structure +
  HMDB/KEGG/PubChem/ChEBI cross-refs); complements USDA FoodData Central.
- TogoID (2): TogoID_convert / TogoID_list_datasets — universal biological ID conversion
  across 117 databases (DBCLS relation graph). Cross-fed into the translate-id command.

All test_examples live-verified; +14 unit tests. Plugin 1.2.12 -> 1.2.14.

* Discovery (batch 13): +1 AlphaFill tool — ligands/cofactors on AlphaFold models

Found by coverage-sweep batch 8 (AlphaFill's transplant data is only mentioned, not
exposed, by existing 3D-Beacons/ChannelsDB tools):
- AlphaFill_get_transplants: for a UniProt, the ligands/cofactors/ions AlphaFill
  transplants into its AlphaFold model (by homology to PDB), aggregated by compound
  with occurrence count, best local-fit RMSD, and source PDB entries. Reveals likely
  cofactor/metal/ligand/drug binding the bare AlphaFold (apo) model doesn't show.

All 3 test_examples live-verified (ABL1->imatinib/STI; CDK2->ATP/MG/staurosporine);
+4 unit tests. Tools-only, no version bump.

* Discovery: cross-feed session tools into 7 more skills (close skill-orphan gap)

8 tools added this session were referenced by zero skills. Wired each into its natural
home (additive, every tool call live-verified):
- Pathoplexus (count/mutations) -> infectious-disease (pathogen genomic surveillance)
- AlphaFill_get_transplants -> protein-structure-prediction (ligands the apo AlphaFold hides)
- AllenCellTypes_search_specimens -> neuroscience
- iDigBio (search/record) -> ecology-biodiversity
- FooDB_get_compound -> metabolomics
- EPA TRI/FRS facilities -> chemical-safety (environmental exposure screening)
- USDA PLANTS (profile/characteristics) -> plant-genomics

Plugin 1.2.14 -> 1.2.15.

* Discovery: +1 skill tooluniverse-clinical-risk-scoring (closes calculator orphan gap)

The 10 ClinicalCalc_* risk-calculator tools were referenced by zero skills. This skill
orchestrates them: maps a clinical scenario to the right score(s), computes, and
interprets against per-score risk-band tables.
- AF -> CHA2DS2-VASc + HAS-BLED (anticoagulation tradeoff)
- cirrhosis -> Child-Pugh + MELD-Na; pneumonia -> CURB-65; sepsis -> qSOFA
- VTE -> Wells DVT/PE; CVD prevention -> ASCVD; kidney -> eGFR (CKD-EPI)
Negative boundary vs polygenic-risk-score / epidemiological-analysis /
diagnostic-test-evaluation. All 10 tool calls live-verified; evals with 3 prompts.
Plugin 1.2.15 -> 1.2.16.

* Fix 2 tools surfaced by test-harness role-play (both HIGH-severity, CLI-confirmed)

Feature-KRAS-001 — annotate_variant_multi_source returned EMPTY from every source
(ClinVar/gnomAD/CIViC) and falsely listed all of them in sources_with_data:
- parsers read the wrong nested paths — gnomAD record is at data.gene.gene_id (not
  data.gene_id), CIViC variants at data.gene.variants.nodes (not data.variants).
- queried ClinVar by a bare protein change ('V600E') which matches nothing; now query
  by gene/rsid and match the specific change with 1-letter->HGVS-3-letter expansion
  (V600E <-> Val600Glu), with honest exact_match flag + gene-level context fallback.
- sources_with_data now reflects actual non-empty results, not 'did not throw'.

Feature-ClinPharm-001 — FDA_get_drug_label_info_by_field_value returned a DIFFERENT
drug (asked imatinib boxed_warning, got sunitinib/regorafenib): return_fields (a
projection) were passed as exists=_exists_ MATCH filters, so requesting a field the
record lacks -> NOT_FOUND -> fallback dropped the exact field:value and matched other
drugs. Fix: exists=None (projection-only); missing fields come back null.

+6 unit regression tests. Tools-only, no version bump.

* Add NCBI Clinical Tables tools (RxTerms, conditions, disease_names)

Closes the one partial gap vs the openai/plugins life-science-research
reference set: ncbi-clinicaltables was only covered for ICD (ICDTool) and
LOINC (LOINCTool). This adds the remaining high-value Clinical Tables
endpoints as keyless tools:

- RxTerms_search_drugs   - drug-name autocomplete + strengths/forms + RxCUIs
- HealthConditions_search - problem-list autocomplete + ICD-10-CM/ICD-9 crosswalk
- DiseaseNames_search    - disease-name autocomplete + UMLS CUI

New ClinicalTablesTool class + clinical_tables_tools.json, registered in
default_config.py and _lazy_registry_static.py. 6 unit tests (mocked) +
live CLI-verified. Tools-only change, no version bump.

* Close 7 capability sub-gaps vs openai life-science-research reference skills

A capability-depth audit of all 50 skills in openai/plugins
life-science-research found 7 endpoints used by reference skills that
TU's tool for that database did not expose. All live-verified + unit-tested:

- gnomad_get_variant_populations  - per-ancestry allele frequencies (was aggregate-only)
- IPD_search_hla_alleles / IPD_get_hla_allele / IPD_search_cells - IPD-IMGT/HLA (no HLA coverage before)
- ProteomeXchange_get_spectrum_by_usi - PROXI spectrum/USI access (was dataset-level only)
- ChEBI_get_ontology_parents - upward ontology traversal (had children only)
- ols_get_efo_term_descendants - transitive EFO subtree (had direct children only)
- PubChem_get_substance_by_SID - depositor substance records (was CID-only)
- EpiGraphDB_get_literature_evidence - SemMedDB literature triples

New IPDIMGTHLATool class (registered both points); all other tools extend
existing classes (no new registration). 56 unit tests, all live-verified.
Tools-only change, no version bump.

* Fix GTEx single-tissue eQTL crash on variant-only query

run() injected gencode_id=None when no gene symbol was supplied, and
_get_single_tissue_eqtls iterated it -> "'NoneType' object is not iterable".
Variant-only eQTL lookups are valid (GTEx filters by variantId alone).
Fix: only inject gencode_id when a symbol is present, and coerce None->[]
in the handler. Surfaced by the openai-reference capability audit.
3 regression tests; live-verified (variant-only now returns real eQTLs).

* Targeted NGS-analysis skill gaps: edgeR/limma routes, fastq-qc skill, scRNA QC

Closes the in-scope subset of the openai ngs-analysis reference plugin
(the parts that fit TU's plan + run-if-available skill model; raw-pipeline
skills needing local binaries are intentionally out of scope):

- rnaseq-deseq2: + edgeR (QL-F/exact) and limma-voom routes with a routing
  table, references/edger_limma_voom.md, and r_edger_limma_wrapper.py
  (preflights Rscript+Bioconductor; emits install plan if absent, never fabricates).
- NEW tooluniverse-fastq-qc: FastQC/MultiQC/fastp/Cutadapt/seqkit QC with a
  module-by-module interpretation table and trim/don't-trim decisioning;
  run_fastq_qc.py preflights tools, writes only to --workdir, never overwrites
  raw FASTQs, emits install plan if tools missing.
- single-cell: + scRNA-seq QC gating section (mito%, doublets, ambient RNA,
  empty droplets, MAD thresholds) + references/scrna_qc.md + run-if-available helper.

All three follow the openai preflight pattern (no vaporware). Tests: rnaseq
100/100, fastq-qc 5/5, single-cell unchanged-baseline. Plugin skills synced;
version 1.2.16 -> 1.2.17.

* Apply PR #248 verification fixes from persona round

Surfaced by post-ship researcher-persona verification:
- HIGH: ols_get_efo_term_descendants/children resolved only EFO: CURIEs, so
  MONDO/HP/Orphanet terms (modern EFO disease nodes are MONDO) returned 0
  silently with a malformed URL. Map non-EFO prefixes to their native OBO/ORDO
  IRI (purl.obolibrary.org/obo/<P>_<n>, orpha.net/ORDO/Orphanet_<n>). MONDO:
  0004993 now returns its 773 descendants. Also add a disambiguation note when
  total=0 (leaf/obsolete/cross-ontology) so it isn't read as 'no subtypes'.
- MEDIUM: remove vendor provenance text from user-facing rnaseq-deseq2 skill
  (script docstring + reference doc).
3 new EFO regression tests (MONDO/Orphanet IRI + zero-note). Plugin skills
synced; 1.2.17 -> 1.2.18.

* Register gnomADGetVariantPopulations in static lazy registry

The new gnomad_get_variant_populations class was missing from
_lazy_registry_static.py, so it would not resolve in frozen/lazy
environments where source files are absent (it only loaded because a
sibling gnomAD tool imports the same module). Add the class->module entry
for consistency with the other gnomAD types.

* Close validation-coverage gaps: test_examples + schemas + Alliance expression fix

From the schema/test-completeness audit (1+2):
- Fix Alliance gene-expression: GET /gene/{id}/expression-summary was retired
  (404), breaking FlyBase_get_gene_expression / ZFIN_get_gene_expression. Switch
  to POST /api/expression (per-annotation stage+location); both now return data
  (dpp 317, shha 350). Updated descriptions/params/return_schema + 3 unit tests.
- Add live-verified test_examples to: alphafold_get_annotations (P00533),
  HPA_get_rna_expression_in_specific_tissues, ArXiv_get_pdf_snippets (1706.03762),
  SemanticScholar_get_pdf_snippets, expression_anova_per_gene, coding_variant_fraction.
- Add return_schema (oneOf) to HMDB_search / HMDB_get_metabolite / HMDB_get_diseases.

Tools whose upstreams are down/blocked (ZINC bot-wall, T3DB Cloudflare-403,
SynBioHub-401, BindingDB-500, SwissTargetPrediction broken) or are async
job/token fetchers (Foldseek/SwissDock/DynaMut2/ReactomeAnalysis) were left
without examples for cause rather than given placeholder IDs. Tools-only, no bump.

* Revive ZINC tools: migrate zinc15 (bot-wall) -> ZINC22/CartBlanche22

zinc15.docking.org now serves a bot-verification HTML wall on every
endpoint, so all 5 ZINC tools were dead. Migrate to the official successor
cartblanche22.docking.org (no bot-wall):
- ZINC_get_compound / ZINC_get_purchasable -> GET /substance/{id}.json
  (record + vendor/price catalogs). Verified: ZINC000000000053 (aspirin),
  mwt 180.159, 450 vendors.
- ZINC_search_by_smiles -> async /smiles.json submit + /search/result/{task}
  poll. Verified: benzene similarity search returns hits.
- ZINC_search_compounds (name) and ZINC_search_by_properties (MW/LogP range):
  ZINC22 is structure/ID-centric with no such endpoints -> clean {status:error}
  pointing to search_by_smiles (honest, not fabricated).
Descriptions/return_schemas updated; 9 unit tests; live-verified test_examples
for the 3 working tools. Tools-only, no version bump.

* Add 7 capability-depth tools (drug/pharmacology + pathway/interaction)

Capability-depth audit of core clusters (vs each upstream API's own endpoints)
found 7 keyless, live-verified endpoints TU's tools did not expose:
- GtoPdb_get_ligand_properties (structure + molecularProperties)
- GtoPdb_get_disease_associations (diseaseTargets + diseaseLigands)
- PharmGKB_get_drug_label_annotations (/data/label FDA PGx biomarker labels)
- PharmGKB_get_pathway (/data/pathway curated PK/PD pathways)
- PharmGKB_get_variant_annotations (/data/variantAnnotation literature-level)
- SIGNOR_connect_proteins (getData.php?type=connect causal sub-network)
- PathwayCommons_paths_between (graph?kind=PATHSBETWEEN multi-gene paths)

All reuse existing tool classes (no registration change). 49 unit tests,
all live-verified. Tools-only, no bump.

* ultracode sweep: +31 capability-depth tools across 8 core clusters

Parallel capability-depth audit (8 clusters vs each upstream API's own
endpoints) + build of confirmed keyless gaps. All reuse existing tool
classes (no new registration). 119 unit tests, 27/27 live-smoke verified.

- variant-clinical (3): VariantValidator_format_genomic_to_transcripts,
  GeneBe_classify_variants_batch, ClinGenAR_lookup_by_external_id
- expression (6): GTEx single-tissue sQTLs / median-transcript / single-nucleus /
  finemapping+independent-eQTL, Harmonizome_get_gene_set_members, SCXA_get_cluster_marker_genes
- protein-structure (4): InterPro_get_residue_annotations, PDBeSIFTS_get_scop_mapping,
  ELM_get_interaction_domains, ProteinsPlus_protonate_structure
- literature (5): PubTator3_GetEntityRelations, openalex_search_sources/get_source,
  EuropePMC_get_article_datalinks, LitVar_get_variant_details
- sequence-genomics (4): EnsemblPheno_get_by_term, Ensembl_get_transcript_haplotypes,
  RNAcentral_get_xrefs_and_pubs, UCSC_list_tracks
- chem-metabolomics (4): KEGG_find_compound, metabolights_get_reference_compound,
  LipidMaps_get_compound_by_xref, SwissLipids_get_children
- disease-ontology (2): HPO_get_disease_annotations, MonarchV3_phenotype_profile_compare
- cancer-genomics (3): GDC_get_mutation_frequency_by_project,
  cBioPortal_get_copy_number_alterations, Progenetix_get_cnv_frequencies

Deferred: ClinVar_get_submitted_records (needs a new class). Tools-only, no version bump.

* ultracode sweep 2: +33 capability-depth tools across 8 more clusters

Second parallel capability-depth sweep (8 clusters vs each upstream API).
All reuse existing classes (no new registration). 109 unit tests, 23/23
live-smoke verified, code-simplifier clean.

- immunology (4): SAbDab structure summary, TheraSAbDab sequences, IMGT germline
  FASTA, IEDB antigen-processing prediction
- microbiology-pathogen (5): Pathoplexus details/FASTA, MGnify samples/downloads,
  ENAPortal run FASTQ search
- gene-regulation (4): ReMap peaks-in-region, ChIPAtlas colocalization/target-genes/
  experiment-metadata
- cheminformatics-admet (4): PubChem BioAssay concise activity, UniChem connectivity,
  PubChem Tox eco/human toxicity
- systems-biology-enzymes (5): ReactomeAnalysis expression/species/found/not-found,
  BioModels list-all-model-ids
- structural-biology-deep (5): BMRB sequence/shift search + validation, 3D-Beacons
  annotations, PDB-REDO version
- gwas-population (2): PGS Catalog performance metrics, Ensembl LD region
- proteomics (4): PRIDE projects-for-protein, PDC quant matrix, ProteomicsDB peptides,
  MassIVE protein identifications

Deferred: VEuPathDB (needs new POST class). Tools-only, no version bump.

* Add 3 new-class capability-depth tools (ClinVar SCV + VEuPathDB POST)

The deferred gaps that required a brand-new @register_tool class (the
reuse-only sweep could not build these):
- ClinVar_get_submitted_records (ClinVarSubmittedRecordsTool): efetch
  rettype=vcv -> per-submitter SCV records. BRAF V600E returns 44 submissions
  across germline/somatic_clinical_impact/oncogenicity axes (existing ClinVar
  tools only return the aggregate classification).
- VEuPathDB_search_genes_by_organism + VEuPathDB_get_gene_record (VEuPathDBTool):
  WDK POST API across the EuPathDB family (PlasmoDB/ToxoDB/FungiDB/...); the
  shared BaseRESTTool is GET-only. PF3D7_0417200 -> DHFR-TS verified.

Registered both classes in default_config.py + _lazy_registry_static.py.
36 unit tests, all live-verified. Tools-only, no version bump.

* ultracode sweep 3: +48 capability-depth tools across 8 more clusters

Third parallel capability-depth sweep (8 clusters vs each upstream API).
All reuse existing classes (no new registration). 144 unit tests, 26/26
live-smoke verified, code-simplifier clean (usda_plants two-source merge confirmed coherent).

- nutrition-food (4): OpenFoodFacts tag filtering, USDA PLANTS wetland/invasive/wildlife
- environment-toxicology (2): EPA TRI facility chemical releases, AOP-Wiki key event
- neuroscience (6): AllenBrain structure expression, NeuroMorpho literature/persistence,
  NeuroVault atlases
- plant-agriculture (3): USDA PLANTS search, Plant Reactome participants/species-tree
- model-organism (15): SGD regulation/sequence/disease, WormBase orthologs/interactions/
  disease, RGD QTLs/resolve, Alliance orthologs/interactions/alleles, PomBase orthologs/
  interactions/GO, InterMine pathquery
- scientometrics (6): ORCID fundings/peer-reviews, OpenAIRE projects, Crossref members,
  ROR affiliation matching
- drug-safety-clinical (6): NCI Thesaurus maps/parents, RxNorm NDC status/properties,
  DailyMed SPL media/history
- cell-imaging (6): IDR images/annotations, BioImage Archive files, HuBMAP provenance,
  CryoET tiltseries/depositions

Skipped (already-covered or key-gated): USDA search dup, Gramene, NIH RePORTER.
Tools-only, no version bump.

* ultracode sweep 4: +41 capability-depth tools across 8 more clusters

Fourth parallel capability-depth sweep. All reuse existing classes (no new
registration). 131 unit tests + base_rest regression, 21/21 live-smoke,
code-simplifier clean. base_rest_tool.py got a backward-compatible opt-in
fields.parse_csv branch (CSV/TSV downloads -> dict rows); non-parse_csv tools
unchanged (regression-verified, only G2P uses it).

- rna-noncoding (7): ENCORI RBP-targets/ceRNA/RNA-RNA/degradome/RBP-disease/motif-scan,
  RNAcentral region ncRNAs
- taxonomy-biodiversity (8): WoRMS classification/vernaculars/distribution/synonyms,
  iDigBio summary-facets/media, GBIF name-match/occurrence-stats
- protein-ptm-motifs (2): iPTMnet proteoform PPI, ScanProsite motif->proteins
- variant-prediction-scores (4): MaveDB mapped/clinical/gnomad variants, GenomeNexus dbSNP
- molecular-interactions-deep (1): OmniPath general-protein annotation geneset
- drug-target-deep (12): Pharos target-ligands/ligand-targets/expression, OpenTargets
  target expression/pathways/depmap/prioritisation/cancer-hallmarks + variant effect/
  consequences/pharmacogenomics + credible-set colocalisation
- metabolic-pathways-deep (5): BiGG download-model, KEGG module/reaction/enzyme,
  WikiPathways pathway-metabolites
- clinical-genetics-panels (2): Gene2Phenotype download-panel, HGNC gene-family members

Tools-only, no version bump.

* ultracode sweep 5 (part 1): +20 capability-depth tools across 6 clusters

Fifth sweep — 6 of 8 clusters built (2 hit transient Anthropic API rate-limit,
re-running separately). All reuse existing classes (no registration). 77 unit
tests, live-smoke verified, code-simplifier clean, no shared-file edits.

- clinical-trials-registries (3): CTIS filtered search, ISRCTN fielded search +
  enriched get_trial (27 fields vs ~10)
- ontology-mapping-services (2): OLS term xrefs, Bioregistry prefix mappings
- proteins-api-features (5): EBI Proteins RNA-editing, variation by dbSNP/HGVS,
  proteins by genomic location, HPP peptides
- chemical-safety-regulatory (3): FDA GSRS substance relationships, RxClass
  class-hierarchy + disease-relations
- pharmacovigilance-faers (3): FAERS count indications/drug-characterization/
  reporter-qualification (bare-list, matches existing FDADrugAdverseEventTool convention)
- sequence-archives (4): GEO supplementary files, SRA run-file locator,
  BioSamples relationships + facets

Tools-only, no version bump.

* ultracode sweep 5 (part 2): +8 tools — functional-enrichment + metabolomics-spectra

Re-run of the 2 clusters that hit a transient Anthropic API rate-limit in part 1.
All reuse existing classes (no registration). 23 unit tests, 7/7 live-smoke,
code-simplifier clean, no shared-file edits.

- functional-enrichment (1): Enrichr gene->genesets reverse lookup (cluster near-saturated)
- metabolomics-spectra (7): MassBank spectral-similarity/get-record/advanced-search,
  MetabolomicsWorkbench studies-by-phenotype + gene-protein, GNPS library-record +
  NPClassifier-from-SMILES

Tools-only, no version bump.

* ultracode sweep 6 (part 1): +2 epidemiology tools (WHO GHO + CDC aggregate)

The only cluster of sweep 6 that completed before transient Anthropic API
rate-limiting cut off the other 7 (re-running separately):
- WHOGHO_list_dimension_values: enumerate dimension values (countries/regions/
  sex/age) with code->title + country->region rollup (BaseRESTTool config-only)
- cdc_data_aggregate: server-side SoQL aggregation ($select/$group/$having/
  $query) on Socrata /resource/{id}.json; CDCRESTTool error envelope hardened
  (top-level error key). cdc_data_get_dataset unchanged (regression-verified).

Tools-only, no version bump.

* ultracode sweep 6 (part 2): +28 tools across 7 clusters

Re-run of the 7 clusters that hit transient Anthropic API rate-limiting in
part 1 (epidemiology already shipped in 96a31611). All reuse existing classes
(no registration). 95 unit tests, 16/17 live-smoke (1 transient upstream OMA
502, handled gracefully), code-simplifier clean, no shared-file edits.

- compound-bioactivity-screening (3): PharmacoDB drug-targets/molecular-profiling,
  SYNERGxDB biomarker-association
- citation-bibliometrics (5): Semantic Scholar paper citations/references/author-papers,
  DBLP author/venue search
- phylogenetics-orthology (5): OMA xref/genome-pair-orthologs/protein-GO, Ensembl
  Compara CAFE tree, OrthoDB group FASTA
- structure-prediction-modeling (3): SWISS-MODEL download-pdb/get-models/batch
- chemistry-vendors-procurement (2): PubChem substances-by-source, list-substance-sources
- clinical-imaging-archives (6): TCIA patients/SOP-UIDs/manufacturer-values,
  OpenNeuro snapshot-files/validation/advanced-search
- genomic-variation-archives (4): NCBI Variation ALFA-frequencies/SPDI<->rsID/VCF->SPDI,
  EVA clustered-variant-by-rs

Tools-only, no version bump.

* Register GDCMutationFreqByProjectTool in static lazy registry

PR review caught this: the GDC_get_mutation_frequency_by_project class was
registered via @register_tool (loads at runtime since sibling GDC tools import
the module) but was missing from _lazy_registry_static.py, unlike all 9 sibling
GDC classes — so it would not resolve in frozen/lazy environments. Add the
class->module entry for consistency.

* peptide-resource sweep: +10 tools across 8 keyless peptide databases

Closes the non-immune peptide gap (AMP / cell-penetrating / therapeutic /
immunopeptidomics) identified against existing IEDB+ESM coverage. All tools
reuse the standard {status,data,metadata} envelope, never raise, 30s timeout,
oneOf return_schema, real test_examples; all live-verified keyless.

Antimicrobial:   DBAASP_get_peptide, DBAASP_search_peptides (DBAASP MIC/activity)
Therapeutic/CPP: Hemolytik2/CancerPPD2/PEPlife2/TumorHope2 search (IIITD)
Property:        PepCalc_peptide_properties (terminal-mod-aware MW/pI/formula)
Immunopeptide:   HLALigandAtlas benign-peptides + donors, MHCMotifAtlas ligands

9 classes registered in _lazy_registry_static.py + default_config.py.
47 unit tests pass; 10/10 tools live-verified against real APIs.

* peptide sweep round 2: +1 tool (PeptideAtlas observed peptides)

Round-2 audit covered Raghava-lab therapeutic/CPP/toxic DBs (CPPsite2, THPdb,
SATPdb, AVPdb, ParaPep, AntiTbPdb), non-Raghava AMP DBs (DRAMP, APD3, CAMPR4,
dbAMP2, YADAMP, LAMP2), and bioactive/structure resources. All but one were
correctly rejected as web-form-only, key-gated, or offline (no dead tools built).

PeptideAtlas_get_observed_peptides — keyless SBEAMS GetPeptides CGI; MS-observed
peptides per protein/build with n_observations, proteotypic score, SSRCalc
hydrophobicity. Distinct from EBI-Proteins (adds observation frequency/scoring).

1 class registered; 7 unit tests pass; live-verified against the real API.

* peptide sweep round 4a: +2 tools (ConoServer conopeptides)

Hand-built the parked ConoServer lead: cone-snail venom peptides. ConoServer
exposes no per-record REST endpoint, only a bulk protein XML export that is not
well-formed (HTML named entities like &alpha;, control chars, mojibake). Added a
sanitizer (HTML-entity->unicode, control-char strip) + lxml recover-parser that
cleanly parses all 8523 entries.

ConoServer_get_conopeptide   — full record by ID (P00001 = alpha-conotoxin SI)
ConoServer_search_conopeptides — filter by name/sequence/pharmacological family/
  gene superfamily/cysteine framework/organism/class (e.g. Conus geographus->171)

2 classes registered; 10 unit tests pass; both live-verified against real data.

* peptide sweep round 4b: +1 tool (Norine non-ribosomal peptides)

Norine_get_peptide — keyless lookup in the Norine non-ribosomal peptides
database (Universite de Lille) by peptide name or Norine ID. Returns the full
record (monomer composition, structure type, formula, MW, activity, organism,
references). name/norine_id are mutually exclusive; IDs accept 123/'00123'/
'NOR00123'. Stability-probed 6x (vs the dropped Peptipedia, this host is stable).

Round-4 immunopeptidome/bacteriocin audit (SysteMHC, BACTIBASE, BAGEL4) found
no keyless APIs — all web-only SPAs/forms, correctly built nothing.

1 class registered; 7 unit tests pass; live-verified (tyrocidine -> 4 records).

* peptide sweep round 5: +6 tools (AMPSphere AMP catalogue)

AMPSphere — Global Microbial smORF Catalogue of antimicrobial peptides (863k
AMPs, 2024). Keyless JSON REST API (ampsphere-api.big-data-biology.org/v1). Six
tools across two modules:
  AMPSphere_get_amp             — full AMP record by accession (seq, family, QC)
  AMPSphere_sequence_match      — exact-sequence lookup (case-normalized)
  AMPSphere_search_amps         — filter by habitat/quality/family + intervals
  AMPSphere_get_family          — family consensus, members, downloads
  AMPSphere_get_amp_distributions — geo/habitat/microbial-source distribution
  AMPSphere_get_amp_features    — physicochemical feature vector

CyBase/MBPDB/Defensins/EROP audited — no keyless structured API, built nothing.
6 classes registered; 17 unit tests pass; all 6 live-verified + stability-probed.

* Add peptide target deorphanization skill

New orchestration skill tooluniverse-peptide-target-deorphanization: given a
peptide sequence + observed phenotype, find its likely real protein target(s) —
for the common case where a peptide is phenotypically active but does NOT bind
its hypothesized target, or binds in one species/assay but not another.

Multi-route, mostly-keyless pipeline, every tool call live-validated on the
exendin-4 -> GLP1R control (all four routes independently recover the class-B1
glucagon/secretin receptor family):
  - characterization + motif: PepCalc/ProtParam, ScanProsite->PROSITE, ELM
  - homology: BLAST_protein_search (swissprot), EBI_msa_align
  - receptor-family enumeration: GPCRdb + HGNC family group + GtoPdb
  - phenotype anchor: OpenTargets disease->associated-targets
  - cross-species reconciliation: EnsemblCompara + Alliance orthologs/paralogs
  - optional structural confirmation: NvidiaNIM_boltz2/alphafold2_multimer/
    openfold3 co-folding (requires NVIDIA_API_KEY)

Registered in the tooluniverse router (both skill trees). Documents exact param
names, gotchas, fallback chains, and two worked examples (control + the real
'anti-insulin-resistance peptide that does not bind GLP1R in mouse' scenario).

* Add runnable scripts to peptide deorphanization skill

Turn the prose pipeline into two one-command scripts (both skill trees):

- deorphanize_peptide.py (keyless, Phases 1-4): characterize -> PROSITE/ELM
  signature -> optional BLAST -> receptor-family panel (HGNC group authoritative,
  GPCRdb annotates) -> OpenTargets phenotype anchor -> Alliance cross-species ->
  ranked candidate shortlist with evidence tiers. Validated live on exendin-4:
  promotes GIPR to Tier 1 (family + phenotype 0.674), flags GLP1R as the
  assay-negative hypothesized target, recovers the full class-B panel.
- cofold_screen.py (Phase 5, key-gated): resolve each candidate receptor sequence
  (GPCRdb -> UniProt accession fallback) and co-fold the peptide via
  NvidiaNIM_boltz2/alphafold2_multimer/openfold3, ranked by interface ipTM; runs
  a DRY RUN (inputs + plan) when NVIDIA_API_KEY is unset.

SKILL.md gains an 'Automated pipeline (scripts)' fast-path section; the per-phase
docs remain as the manual/fallback reference.

* Enhance peptide deorphanization script: seedless, ELM, protease, batch

Four capabilities added to deorphanize_peptide.py (both trees), each validated:

- SEEDLESS mode (omit --hypothesized-target): derive candidate receptors from
  the PROSITE family keywords via UniProt ('<kw> receptor' -> receptor genes ->
  family). Wiring offline-validated (recovers the full glucagon-receptor family);
  degrades gracefully to phenotype-only when the resolver is transiently down.
- ELM LIG motif regex auto-match: scan the peptide against ELM ligand-motif
  regexes, rank by rarity (probability), annotate each with the Pfam binding
  domain it engages (ELM_get_interaction_domains) — for peptides without a named
  PROSITE family.
- Protease / degradation liability: DPP4 N-terminal rule (P2 = A/P -> labile,
  e.g. native GLP-1; G -> resistant, e.g. exendin-4) + ELM CLV cleavage-motif
  scan. A labile flag warns that an assay-negative may be DEGRADATION not
  non-binding — a key alternative explanation for 'active in vitro, not in mouse'.
- BATCH mode (--fasta): one record per FASTA entry, shared phenotype/species.

Added a transient-connection retry to the tool wrapper. SKILL.md documents all
four modes/signals. Regression intact: exendin-4 -> GIPR Tier 1, GLP1R flagged.

* Add ProteomicsDB meltome + ClusPro peptide-docking tools

Fill two gaps for peptide target deorphanization (verified buildable after a
free-vs-paid audit of candidate services):

ProteomicsDB_get_protein_meltome (keyless, live-validated): thermal proteome
profiling (TPP/meltome) melting curves for a protein by gene/UniProt -> apparent
Tm + fit quality. Ligand-induced Tm shift = CETSA/TPP target deconvolution.
Validated: MAPK1 Tm~56C, CDK2 Tm~54C. NOTE soluble-proteome only (membrane GPCRs
absent), documented in the tool.

ClusPro_submit_peptide_docking (free for academic; requires CLUSPRO_USERNAME +
CLUSPRO_API_SECRET): native peptide-protein docking submit, replicating the
open-source cluspro-api flow exactly (sorted key+value HMAC-MD5 signature, peptide
mode pepmot/pepseq vs a PDB-code receptor). Returns the ClusPro job id (async).
Signature + form construction + no-key/missing-arg guards unit-tested offline;
needs a live round-trip with a real academic account.

2 classes registered; CLUSPRO_* added to .env.template; 9 unit tests pass.

* Wire meltome + ClusPro into peptide deorphanization skill

- Phase 4: ProteomicsDB_get_protein_meltome as target-engagement evidence for
  soluble candidates (CETSA/TPP Tm shift = engagement); noted GPCRs are absent.
- Phase 5: ClusPro_submit_peptide_docking as the academic-free (no NVIDIA key)
  structural-confirmation path for candidates with a PDB structure.
- Phase 0 tool-verification list updated with both.

* Automate cross-species interface alignment + close Phase 5 gaps in peptide deorphanization

Gap-analysis fixes for the peptide target-deorphanization skill, all driven by
the real source-organism-binding / mouse-negative case:

- Implement the headline 'binds in A, not B' step: deorphanize_peptide.py now
  resolves each top candidate's human + assay-species (+ new --source-species)
  ortholog sequences via UniProt and aligns the binding interface with
  EBI_msa_align, reporting per-pair % identity and substitution counts. Was
  documented in Phase 3 but previously only reported ortholog present/absent.
- Add --source-species for 3-way human/assay/source interface reconciliation
  (the organism where binding WAS observed), with graceful degradation when the
  source organism is absent from UniProt (e.g. a protist).
- Flag non-canonical / cyclic residues (BLAST/PROSITE assume canonical linear
  L-AA); route NRP/cyclic peptides to Norine + cofold --cyclic.
- Resolve a ClusPro-ready representative PDB per top candidate via
  PDBeSIFTS_get_best_structures, making the academic-free docking path usable.
- Fix cofold_screen.py openfold3 args: co-fold both chains in one input
  (molecules array) instead of two separate monomer inputs.
- Fix cofold_screen.py ortholog_sequence: use the UniProt gene+organism path
  instead of the GPCRdb entry-name guess (species suffix is '_mouse', not the
  'mus_musculus' token), and add boltz2 --cyclic support.
- Fix Alliance methods-list -> count in ortholog_status.

Adds tests/unit/test_peptide_deorphanization_scripts.py (13 offline tests).
SKILL.md + both skill trees updated and kept in sync.

* Restructure peptide deorphanization skill per skill-creator rubric

Apply the skill-creator standards to the peptide target-deorphanization skill:

- Progressive disclosure: SKILL.md trimmed 359 -> 105 lines (reasoning +
  fast-path scripts + a phase-at-a-glance table + output spec + pointers). The
  full per-phase manual reference (every tool call, parameter names, gotchas,
  fallback chains, runtime notes, two worked examples) moves to
  references/phases.md (with a table of contents), read on demand.
- Add a reproducible eval set: evals/evals.json with 3 realistic test prompts
  and 16 checkable assertions (the exendin-4 control, the source-organism /
  mouse-negative real case, and a non-GPCR seedless case).
- Tighten the output-format / evidence-tier spec inline in SKILL.md so the
  deliverable contract is in the always-loaded body.

No script or tool changes; both skill trees kept in sync.

* Generalize peptide deorphanization beyond GPCRs to any target class

The skill was over-indexed on the receptor-ligand / GPCR subclass. Make it a
general 'peptide -> any protein target' deorphanizer:

- Target-class router (_classify_target_class): classifies the peptide up front
  (gpcr_ligand / ion_channel_toxin / protease_inhibitor_or_substrate /
  cytokine_or_growth_factor / guanylyl_cyclase_ligand / integrin_ligand /
  antimicrobial / unknown) from motif + homology text + sequence features (RGD
  motif, cysteine density), and selects the candidate-generation strategy and
  seedless search nouns accordingly.
- Generalize seedless: keywords now also come from BLAST homolog names (not only
  PROSITE), and the UniProt resolver searches class-aware nouns (receptor /
  channel / protease / integrin / ...), not 'receptor' only.
- InterPro general family route (interpro_family_members) added to family_panel:
  HGNC gene-family is the general backbone, InterPro is a general cross-check (or
  supplies the panel when there is no curated HGNC group), GPCRdb stays a
  GPCR-only cross-check. The 'two general resources agree' check now holds for
  non-GPCR families too.
- Docs reframed receptor-family -> target-family; added the target-class router
  table (Phase 1e), the general HGNC/InterPro enumeration route, and a non-GPCR
  worked example (omega-conotoxin -> Cav2.2/CACNA1B ion channel).
- evals.json: strengthened the seedless case to check non-GPCR classification and
  added a known-answer non-GPCR control (conotoxin -> CACNA1B).

Adds 8 offline tests (classifier across classes, generalized seedless, InterPro
enumeration + bounded merge, HGNC-authoritative annotation). 30/30 pass. Both
skill trees synced.

* Fix InterPro family route against the live API + bound HGNC supergroups

Live-tested the InterPro general enumeration route on a non-GPCR target
(CACNA1B / Cav2.2) and fixed two real flaws the mock tests could not catch:

- InterPro_get_proteins_by_domain returns UniProt ACCESSIONS (no gene field) and
  MIXES organisms. The previous parser read a non-existent 'gene' field and would
  have returned nothing live. Now filter to human (tax_id '9606') and batch-map
  the accessions -> gene symbols in one UniProt 'accession:A OR accession:B …'
  query. interpro_family_members('Q00975') now returns the full calcium-channel
  alpha-1 family {CACNA1A…CACNA1S}.
- HGNC over-enumeration: a gene can sit in multiple HGNC groups, including a domain
  SUPERGROUP (CACNA1B is also in 'EF-hand domain containing', ~200 genes), which
  flooded the panel with irrelevant calcium-binding proteins. Skip any HGNC group
  larger than _HGNC_GROUP_CAP (80) and record it under meta['skipped_broad_groups'].

After the fix the CACNA1B panel is exactly the 10 calcium-channel genes, each
cross-checked HGNC+InterPro; GLP1R regression unchanged. Updated the 3 InterPro
unit tests to the real response shapes and added a supergroup-cap test (31 pass).
references/phases.md documents the human-filter + accession->symbol map + the
supergroup cap. Both skill trees synced.

* Validate generality across non-GPCR classes; rank cross-checked family core first

Live-tested family enumeration on representative non-GPCR seeds (EGFR RTK, IL6R
cytokine receptor, MMP9 metalloprotease, CTSK cysteine protease) on top of the
CACNA1B channel case. All degrade gracefully; the HGNC supergroup cap is well
calibrated — it skipped junk supergroups (CD molecules 394, Ig-like 101, EF-hand
~200) while keeping legitimate families (ErbB 4, interleukin receptors 41).

Observed: loose HGNC groups give noisier protease panels (35-46 genes) where the
HGNC∩InterPro intersection pinpoints the tight core (11 MMPs, 4 cathepsins). So
rank a 2-source-corroborated family member above an HGNC-only loose-group member
(_rank_key tiebreaker on family-source count, after phenotype score). Documented
the multi-class behavior in references/phases.md. +1 test (32 pass).

* Refine: wrong-seed robustness, multi-phenotype union, fix seedless organism param

Two robustness refinements + a live-caught bug fix:

- Wrong-seed robustness: candidate generation now ALWAYS unions the
  sequence-derived (motif + homology) candidates, even in seeded mode. A wrong
  hypothesized target can no longer blind the search to the real target's family
  — the deorphanization premise. Extracted _build_panel() so this is unit-tested
  (a deliberately wrong EGFR seed still rescues the real class-B family).
- Multi-phenotype anchor: --phenotype is now repeatable; phenotype_union() unions
  the OpenTargets anchor across all given phenotypes keeping the max score per
  target (verified live: t2d + obesity -> 47 targets, GLP1R 0.77 / GIPR 0.70).
- Fix seedless_seeds: UniProt_search organism was '9606' (a taxid, which errors);
  it takes a common name ('human'). Found by a live run; documented the gotcha +
  the graceful UniProt-down degradation (HGNC family + phenotype still work).

+2 tests (34 pass). Both skill trees synced.

Note: UniProt was mid-outage during this session so seedless could not be
re-confirmed end-to-end live; the union logic is unit-tested and the organism
fix matches the UniProt_search docs + earlier successful live calls.
… gateways, and knowledge-base APIs (#260)

* Add canonical VCF variant-statistics tool (bcftools-backed)

Adds VCFStatsTool with three deterministic, local-compute operations over a
user-supplied VCF/BCF:
- VCF_summary_stats: records/SNPs/indels/MNPs, multiallelic sites, ts/tv,
  per-sample het/hom/missing
- VCF_count_variants: counts after PASS/QUAL/region/expression filters
- VCF_normalize: split/join multiallelics (+ optional left-align against a
  reference FASTA), reporting counts before vs after

All 12 existing VCF-related tools are database lookups (query a variant by
ID/region); none compute statistics from a user's own file, so agents rolled
ad-hoc parsers that disagreed on multiallelic splitting and indel
left-alignment. Routing those questions through bcftools stats/norm makes the
SNP/indel counts reproducible. Registered in default_config.py and
_lazy_registry_static.py; flagged requires_local_input; variant-analysis skill
points at the tools as the structured one-call form of its bcftools recipe.

* Add ROC/AUC analysis tool + compute-tool roadmap

ROC_analysis (ROCAnalysisTool): deterministic, local-compute diagnostic-accuracy
analysis for any binary classifier or continuous biomarker. Takes scores + 0/1
labels (inline arrays or a CSV) and returns AUC with a bootstrap 95% CI, the
Youden-optimal cutoff and its sensitivity/specificity, optional metrics at a
fixed cutoff, and a downsampled ROC curve. scikit-learn under the hood; no API
key; never raises. Generic by construction — conventions (positive label,
cutoff) are parameters, output is the standard sklearn result, no task-specific
assumption baked in. Registered in default_config.py and _lazy_registry_static.py;
diagnostic-test-evaluation skill points at it. 9 unit tests + 2 runnable
test_examples pass.

docs/COMPUTE_TOOL_ROADMAP.md: evidence-based map of the remaining gap. ToolUniverse
covers knowledge/DB access comprehensively (568 APIs, ~2534 tools) but only a
handful of tools compute on the user's own data; ~86 analysis scripts across 33
skills are not callable tools. The roadmap prioritizes promoting the genuinely
general ones to tools and lists net-new gaps. Includes a "generality gate":
promote the method, not the script — engine-wrappers that expose conventions as
parameters qualify; benchmark answer-sweepers (emit-every-variant-to-match-GT)
and convention/format-hardcoded scripts are explicitly excluded.

* Fix ROC tool: drop scikit-learn dependency, use pure NumPy

scikit-learn is not a core ToolUniverse dependency (numpy is), so the
sklearn-backed version returned an error on a default install and failed CI.
Reimplemented AUC as the tie-aware Mann-Whitney rank sum (verified identical to
sklearn.metrics.roc_auc_score across random tie-heavy cases) and the ROC sweep
in pure NumPy. Tool now runs with no optional deps; 9 unit tests + 2
test_examples pass, and it works with sklearn import blocked.

* Roadmap: document the framework's optional-dependency design (lazy-loader graceful skip + required_packages + pyproject extras)

* Roadmap: correct compute-tool inventory (58 across 17 classes, not 5) and drop already-existing candidates

A find_tools/registry audit found the compute layer is far more populated than
the first draft claimed: 58 local-compute tools across 17 classes, not '5 of
2539' (that count only included requires_local_input file tools, missing every
inline-data compute tool). Survival, dose-response/IC50, enzyme kinetics, PK NCA,
population genetics, meta-analysis, DESeq2, sequence utils, and drug synergy all
already exist as tools and were removed from the build list. Added a
search-before-you-build audit table; narrowed Part A to find_tools-confirmed gaps
(edgeR/limma, expression PCA, FASTQ QC, single-cell compute, dN/dS, methylation
density, network proximity, GATK calling). ROC remains a genuine gap (shipped).

* Add Network_proximity tool (Guney/Barabasi 2016, networkx)

Network_proximity (NetworkProximityTool): deterministic closest-distance network
proximity between two node sets (e.g. drug targets vs a disease module) with a
degree-matched random Z-score and empirical p. Confirmed no existing tool via a
find_tools/registry audit before building.

Pure networkx + numpy (both core deps) — no network call, no API key, never
raises. General by construction: the caller supplies the graph (inline `edges`
or a 2-column `edgelist_path`) plus `targets`/`disease_genes`, so it works for
any interactome/metabolic/custom network and does not bake in a database or
species (unlike the skill's STRING-downloading CLI script). Registered in
default_config.py + _lazy_registry_static.py; network-pharmacology skill points
at it. 8 unit tests + 1 runnable test_example pass.

* Generalize Network_proximity: measure family + domain-neutral node sets

Make the tool cover its method family and decouple it from one use case:
- add `measure`: closest (Guney 2016, default), shortest (all-pairs mean), and
  separation (Menche 2015 s_AB; <0 = overlapping modules) — instead of only the
  closest measure.
- rename params to domain-neutral `set_a`/`set_b`, keeping `targets`/`disease_genes`
  as aliases, so the tool works for any two node sets (two pathways, two marker
  sets, …), not just drug-target vs disease.
Output keys generalized (value/measure, n_set_a_in_network, missing_set_a). Still
pure networkx+numpy, deterministic, never raises. 12 unit tests + 1 test_example
pass.

* network-pharmacology skill: wire STRING->Network_proximity (verified end-to-end) + full-interactome caveat

Smoke-tested STRING_get_network -> Network_proximity: edge rows' preferredName_A/B
(gene symbols) map straight to [A,B] pairs and line up with symbol-based gene
sets natively. Documented the chain, the new measure/set_a/set_b params, and the
gotcha the test surfaced: a small query-centered subnetwork yields an
uninformative z>0/p≈1 because the degree-matched null collapses — use a large
interactome (high limit or NDEx_get_network) for a meaningful Z.

* Add Sequence_dn_ds tool (Nei-Gojobori dN/dS, pure Python)

Promotes the comparative-genomics skill's validated Nei-Gojobori (1986) dN/dS
implementation to a callable tool (verified bit-identical). Pure Python, general
(any two coding sequences), no existing tool. 10 unit tests pass.

* Add RNAseq_edger_limma_de tool (edgeR / limma-voom DE)

edgeR/limma-voom DE complementing the DESeq2 tool, following the rnaseq-deseq2
skill + DESeq2Tool precedent; R pipeline ported from the skill's validated
wrapper. Recovers all planted DE genes end-to-end. 5 unit tests.

* Add 5 ML-model / API tools (9 wrappers), live-verified

Discovered + built via api-tool-builder agents, each confirmed against a LIVE
API and exercised through test_new_tools.py (22 example runs, 0 failures):

- gnomad_get_constraint (GnomADConstraintTool) — gnomAD GraphQL gene constraint:
  pLI, LOEUF, mis_z, syn_z, obs/exp LoF. No key. Adds LOEUF/Z fields the existing
  gnomad_get_gene_constraints lacks.
- HFInference_classify_text / _embed_text / _fill_mask (HuggingFaceInferenceTool)
  — HuggingFace serverless inference (router.huggingface.co): classification,
  embeddings, fill-mask (works for protein LMs e.g. ESM2). Optional HF_TOKEN.
- IUPred3_predict_disorder (IUPred3Tool) — IUPred3 protein disorder/ANCHOR
  prediction from a UniProt accession (prediction, not a DB lookup). No key.
- ALOGPS_predict_logp_logs (ALOGPSTool) — ALOGPS 2.1 ASNN model: logP + aqueous
  logS from SMILES. No key.
- VEP_predict_pathogenicity_by_rsid/_by_hgvs/_by_region (VEPPathogenicityTool) —
  Ensembl VEP with AlphaMissense + SIFT + PolyPhen-2 per transcript. No key;
  surfaces AlphaMissense scores no existing tool exposed.

Registered in default_config.py + _lazy_registry_static.py; API-key catalog
regenerated. All return {status,...} envelopes, never raise, 30s timeouts.

* Add 4 more ML-model / API tools (TDC, IBM RXN, Replicate, DTU)

TDC oracles (live), DTU DeepTMHMM/SignalP via biolib (live), IBM RXN + Replicate
(key-gated, built+no-key-path-verified). g:Profiler already existed. All modules
import without optional deps (CI-safe). Central registration + catalog regen.

* Add Cellpose / TDC-datasets / SCREEN-cCRE tools + 5 HuggingFace task wrappers

Third discovery batch: HF extended with 5 task wrappers (live), Cellpose
segmentation (live), TDC dataset retrieval (live), ENCODE SCREEN cCRE (live;
Enformer/Borzoi infeasible). All CI-safe. Central registration.

* Add jPOST / RNAcentral / LIPID MAPS-LMPD / InterPro-memberDB + HF vision wrappers

Fourth discovery batch: 4 novel data-API tools (each pivoted around already-wrapped
APIs) + HF image-classification/object-detection. All live-verified, CI-safe.

* Add Rhea / PanglaoDB / IDR-search / GBIF-taxonomy / UniBind tools

Fifth discovery batch: 5 novel data-API tools (each pivoted around already-wrapped
APIs), owning skills updated. All live-verified, CI-safe.

* Add ZOOMA / RGD-strains / SGD-protein / HuBMAP-biospecimen tools

Sixth discovery batch (EBI/NAR + model-organism + spatial omics): 4 novel tools,
owning skills updated. All live-verified, CI-safe.

* Fix dtu_protein test_examples: flatten + success-only (validation pass)

The DTU agent wrote test_examples as a {description,arguments} wrapper plus an
intentional error case; the harness reads top-level keys as args and counts
error-status returns as failures. Flattened to a single flat success example.
The tool itself was already correct (verified: DeepTMHMM returns real topology).

* Fix correctness issues found in pre-merge adversarial audit

Independent correctness/faithfulness audit of the new tools surfaced real
defects (beyond "test_examples don't error"); fixed:

Real bugs:
- rnacentral_genome: RNAcentral_get_sequence/_get_publications called with only
  urs_id silently routed to genome_locations (operation default not injected;
  hardcoded fallback). Now each wrapper fixes its op via fields.operation.
- idr_searchengine: IDR_search_images mapped `study` to non-existent keys
  (study_title/study) -> always null. Now reads screen_name/project_name (+ adds
  dataset_name).

Method fix:
- network_proximity: degree-matched Z-score null was an oversimplified Guney 2016
  (exact-degree bins => hubs map only to themselves; with-replacement intra-set
  sampling). Rewrote: degree bins merged to >= 100 nodes, sample WITHOUT
  replacement, empirical p uses (k+1)/(n+1).

return_schema correctness:
- Normalized 14 tools whose oneOf success-branch described the data array
  directly instead of the {status,data} envelope (panglaodb actually failed
  harness schema validation; others passed only by luck). Now all are proper
  envelope schemas.

Honesty/labels:
- TDC LogP oracle relabeled (it is TDC penalized-logP, not octanol-water logP).
- IUPred3 academic/non-commercial license note; PanglaoDB 2020-freeze note;
  screen_ccre staging-endpoint reliability note.

* Trim redundancy from the audit: drop ALOGPS + RNAcentral_get_publications, collapse VEP 3->1

Acting on the value-audit's clearest redundancy findings:
- Removed ALOGPS (2005-era logP/logS model; ToolUniverse already has RDKit,
  SwissADME, and ADMET-AI for lipophilicity/solubility). Deleted tool + config +
  registration.
- Removed RNAcentral_get_publications (duplicated the existing
  RNAcentral_get_xrefs_and_pubs). Kept the genuinely-novel
  RNAcentral_get_genome_locations + _get_sequence.
- Collapsed VEP_predict_pathogenicity_{by_rsid,by_hgvs,by_region} into a single
  VEP_predict_pathogenicity that auto-routes by whichever input is given
  (rsid | hgvs_notation | chrom+pos+alt) — 3 tools -> 1.

Kept gnomad_get_constraint (verified the existing gnomad_get_gene_constraints
does NOT expose LOEUF/mis_z/syn_z/pli, so it adds real value) and jPOST
(jPOST-native metadata not surfaced via ProteomeXchange). Net -4 tools (2756->2752).

* Complete audit trim: remove ALOGPS registration, RNAcentral pubs wrapper, collapse VEP wrappers

Follow-up to 0eb8605 (which only captured the ALOGPS file deletions because the
git add errored on the already-staged path). Commits the rest: drop alogps from
default_config + lazy registry; remove RNAcentral_get_publications wrapper + dead
code; collapse the 3 VEP wrappers into one auto-routing VEP_predict_pathogenicity.

* Remove jPOST tool (redundant with ProteomeXchange)

The pre-merge audit flagged jPOST as marginal: ProteomeXchange (already wrapped)
indexes jPOST submissions, so the only delta was jPOST-native JPST metadata.
Dropped the tool + config + registration. Kept gnomad_get_constraint, which was
also flagged but live-verified to add LOEUF/mis_z/syn_z/pli that the existing
gnomad_get_gene_constraints does not expose.

* Remove compute-tool roadmap doc from PR (planning backlog, not shipped tooling)

* Add keyless ESM-2 masked-marginal variant scorer + verified biomedical HF model_ids
@taferh taferh merged commit 8551d3a into squirro:main Jun 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants