Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,27 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.0.1] - 2026-03-03

### Changed

- **BLAST E-value raised to Primer-BLAST default** (`config.py`): `blast_evalue` 1000 → 30000 to match the Primer-BLAST default and ensure short, gapped primer alignments are not silently dropped.
- **BLAST word size now explicit** (`config.py`, `specificity.py`, `pipeline.py`): New `blast_word_size` field (default 7) passed to `blastn -word_size`, matching the Primer-BLAST / blastn-short default. Previously relied on the implicit blastn-short default; now configurable.
- **Tightened poly-X default** (`config.py`): `primer_max_poly_x` default lowered from 5 to 4.
- **Increased BLAST `max_mismatches` default** (`config.py`): 2 → 3 so that 3-mismatch paralog alignments (e.g. DCAF12L1) are classified as "predicted bound".
- **BLAST mismatch penalty** (`config.py`): New `blast_penalty` default changed from the blastn-short default of -3 to -1. With reward=1/penalty=-3, a 21bp primer with 3 mismatches scores only 9 (E ≈ 190,000) and is silently dropped by BLAST. With penalty=-1, the same alignment scores 15 (E ≈ 70), making paralog off-targets like DCAF12L1 detectable.
- Updated both `designer_default_config.json` and `designer_lenient_config.json` to match the new defaults.

### Added

- **G/C-specific poly-run filter** (`utils.py`): New `find_max_poly_gc()` function rejects primers with homopolymer runs of G or C exceeding `primer_max_poly_gc` (default 3). This catches primers like the DCAF12L2 `GGGGCCGGATGTTCTGCTG` forward primer that passed the old poly-X filter. Wired through `check_kmer()`, `generate_kmers()`, `filter_kmers()`, and `design.py`.
- **BLAST `-evalue` parameter** (`config.py`, `blast_runner.py`, `specificity.py`, `pipeline.py`): New `blast_evalue` field (default 30000.0) passed to `blastn -evalue`, ensuring weak but real off-target alignments are not silently dropped before the annotator can evaluate them.
- **BLAST `-reward` / `-penalty` parameters** (`config.py`, `blast_runner.py`, `specificity.py`, `pipeline.py`): New `blast_reward` (default 1) and `blast_penalty` (default -1) fields override the blastn-short scoring defaults, improving sensitivity for mismatched paralog alignments.
- **BLAST `-max_hsps` and `-dust` flags** (`config.py`, `blast_runner.py`, `specificity.py`, `pipeline.py`): New `blast_max_hsps` (default 100) limits HSPs per query-subject pair, and `blast_dust` (default `"yes"`) enables low-complexity masking. Together these cut BLAST runtime significantly after the evalue/penalty sensitivity increase, without affecting off-target detection (the annotator already discards low-quality HSPs).
- **Direct tabular BLAST output** (`blast_runner.py`, `specificity.py`): `BlastRunner.run()` now accepts `output_table=` to write tabular results (`-outfmt 6`) directly, eliminating the intermediate archive (`-outfmt 11`) write/read and the `blast_formatter` subprocess. The specificity check uses this new path by default, removing the `try/finally` archive cleanup. The legacy `output_archive=` parameter is deprecated but still functional.
- **3'-end tolerance for BLAST annotation** (`annotator.py`, `config.py`, `specificity.py`, `pipeline.py`): New `three_prime_tolerance` parameter (default 3) relaxes the `from_3prime` check from `qend == qlen` to `qlen - qend <= tolerance`. BLAST's local alignment clips terminal mismatches, causing hits like the DCAF12L1 reverse primer (19/21bp aligned, 2bp clipped at 3' end) to be wrongly discarded as "not 3'-anchored" even though Primer-BLAST detects them via semi-global alignment.
- Tests for `find_max_poly_gc`, `check_kmer` poly-GC integration, BLAST evalue/reward/penalty/word_size parameter forwarding, and specificity check threading.

## [1.0.0] - 2026-03-03

First stable release. All v1.0 roadmap items complete — see `docs/ROADMAP.md` for the full list.
Expand Down
12 changes: 10 additions & 2 deletions config/designer_default_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"primer_min_length": 15,
"primer_max_length": 30,
"PRIMER_OPT_BOUND": 98.0,
"PRIMER_MIN_BOUND": -10.0,
"PRIMER_MIN_BOUND": 30.0,
"PRIMER_MAX_BOUND": 120.0,
"junction_padding_bases": 3,
"forward_tail": "GGACACTCTTTCCCTACACGACGCTCTTCCGATCTAAAAAAAAAAAAAAAAAAAATGGGAAAGAGTGTCC",
Expand All @@ -21,6 +21,7 @@
"primer_max_gc": 70,
"primer_gc_clamp": 1,
"primer_max_poly_x": 5,
"primer_max_poly_gc": 3,
"primer_max_n": 0,
"PRIMER_MAX_SELF_ANY_TH": 45.0,
"PRIMER_MAX_SELF_END_TH": 35.0,
Expand Down Expand Up @@ -73,7 +74,14 @@
"blast_parameters": {
"length_threshold": 15,
"evalue_threshold": 10.0,
"max_mismatches": 2,
"max_mismatches": 3,
"three_prime_tolerance": 3,
"blast_evalue": 30000.0,
"blast_word_size": 7,
"blast_reward": 1,
"blast_penalty": -1,
"blast_max_hsps": 100,
"blast_dust": "yes",
"max_amplicon_size": 2000,
"ontarget_tolerance": 5
},
Expand Down
12 changes: 10 additions & 2 deletions config/designer_lenient_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@
"primer_min_gc": 30,
"primer_max_gc": 70,
"primer_gc_clamp": 0,
"primer_max_poly_x": 5,
"primer_max_poly_x": 4,
"primer_max_poly_gc": 3,
"primer_max_n": 0,
"PRIMER_MAX_SELF_ANY_TH": 45.0,
"PRIMER_MAX_SELF_END_TH": 35.0,
Expand Down Expand Up @@ -73,7 +74,14 @@
"blast_parameters": {
"length_threshold": 15,
"evalue_threshold": 10.0,
"max_mismatches": 2,
"max_mismatches": 3,
"three_prime_tolerance": 3,
"blast_evalue": 30000.0,
"blast_word_size": 7,
"blast_reward": 1,
"blast_penalty": -1,
"blast_max_hsps": 100,
"blast_dust": "yes",
"max_amplicon_size": 2000,
"ontarget_tolerance": 5
},
Expand Down
1 change: 1 addition & 0 deletions docs/ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -639,6 +639,7 @@ Add a `max_candidates_per_junction` parameter to `MultiplexPickerParameters` (de
**File: `src/plexus/resources.py`**

Add presets for:

- `hg19` / `GRCh37` (legacy clinical datasets)
- `mm39` (mouse, common model organism)
- `GRCh38_no_alt` (analysis-set FASTA without alt contigs, preferred for clinical pipelines)
Expand Down
21 changes: 17 additions & 4 deletions src/plexus/blast/annotator.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,27 @@ def __init__(self, blast_df, target_map=None):
self.target_map = target_map or {}

def build_annotation_dict(
self, length_threshold=12, evalue_threshold=4, max_mismatches=2
self,
length_threshold=12,
evalue_threshold=4,
max_mismatches=2,
three_prime_tolerance=0,
):
"""
Build a dictionary specifying conditions for annotation

Build a dictionary specifying conditions for annotation.

Parameters
----------
three_prime_tolerance : int
Maximum number of unaligned bases at the 3' end of the primer
for a hit to still be considered "3'-anchored". BLAST's local
alignment clips terminal mismatches, so a primer with a mismatch
at or near the 3' end will have ``qend < qlen``. A tolerance
of 2-3 catches these cases without accepting 5'-only hits.
"""
self.annotations = {
"from_3prime": lambda row: row["qend"] == row["qlen"],
"from_3prime": lambda row: (row["qlen"] - row["qend"])
<= three_prime_tolerance,
"length_pass_3prime": lambda row: (
row["length"] >= length_threshold
and row["mismatch"] <= max_mismatches
Expand Down
63 changes: 52 additions & 11 deletions src/plexus/blast/blast_runner.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import os
import warnings

import pandas as pd
from loguru import logger
Expand Down Expand Up @@ -30,8 +31,7 @@ def __init__(self, input_fasta, reference_fasta):
"""
Interface for running BLAST:
- Creates BLAST database
- Runs BLAST in archive mode
- Reformats to a table
- Runs BLAST (direct tabular output or legacy archive mode)
- Converts table to a pandas data frame

params
Expand Down Expand Up @@ -91,19 +91,48 @@ def create_database(self):
return self

def run(
self, output_archive, word_size=None, task="blastn-short", num_threads: int = 1
self,
output_archive=None,
*,
output_table=None,
word_size=None,
task="blastn-short",
num_threads: int = 1,
evalue: float | None = None,
reward: int | None = None,
penalty: int | None = None,
max_hsps: int | None = None,
dust: str | None = None,
):
"""
Run blast, writing a BLAST archive to `output_archive`.
Run blastn and write results.

Uses ``-task blastn-short`` by default, which is tuned for
primer-length queries (<30 bp) with word_size=7, reward 1,
penalty −3, and gap costs 5/2.

Note that we run with output format 11 ``-outfmt 11`` to
produce the archive; from this format you can convert to
all other formats.
Parameters
----------
output_table : str
Path for direct tabular output (``-outfmt 6``). Preferred.
output_archive : str
*Deprecated.* Path for a BLAST archive (``-outfmt 11``).
Use ``output_table`` instead to skip the archive/reformat step.
"""
if output_table is not None:
outfmt = f"6 {self.BLAST_COLS}"
out_path = output_table
elif output_archive is not None:
warnings.warn(
"output_archive is deprecated; use output_table for direct "
"tabular output and avoid the archive/reformat step.",
DeprecationWarning,
stacklevel=2,
)
outfmt = "11"
out_path = output_archive
else:
raise ValueError("Either output_table or output_archive must be provided.")

cmd = [
"blastn",
Expand All @@ -114,19 +143,31 @@ def run(
"-task",
task,
"-outfmt",
"11",
outfmt,
"-out",
output_archive,
out_path,
]
if word_size is not None:
cmd.extend(["-word_size", str(word_size)])
if evalue is not None:
cmd.extend(["-evalue", str(evalue)])
if reward is not None:
cmd.extend(["-reward", str(reward)])
if penalty is not None:
cmd.extend(["-penalty", str(penalty)])
if max_hsps is not None:
cmd.extend(["-max_hsps", str(max_hsps)])
if dust is not None:
cmd.extend(["-dust", dust])
if num_threads > 1:
cmd.extend(["-num_threads", str(num_threads)])

run_command(cmd, check=True, retries=2)

# Save as instance variable, for reformattings
self.output_archive = output_archive
if output_table is not None:
self.output_table = output_table
else:
self.output_archive = output_archive

return self

Expand Down
35 changes: 25 additions & 10 deletions src/plexus/blast/specificity.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,16 @@ def run_specificity_check(
*,
length_threshold: int = 15,
evalue_threshold: float = 10.0,
max_mismatches: int = 2,
max_mismatches: int = 3,
three_prime_tolerance: int = 3,
max_amplicon_size: int = 2000,
ontarget_tolerance: int = 5,
blast_evalue: float = 30000.0,
blast_word_size: int = 7,
blast_reward: int = 1,
blast_penalty: int = -1,
blast_max_hsps: int = 100,
blast_dust: str = "yes",
):
"""
Run BLAST on all candidate primers in the panel to check for specificity
Expand All @@ -34,8 +41,14 @@ def run_specificity_check(
length_threshold: Minimum 3'-anchored alignment length (bp) to predict binding.
evalue_threshold: E-value cutoff for predicted binding.
max_mismatches: Maximum mismatches in a 3'-anchored alignment.
three_prime_tolerance: Max unaligned bases at the 3' end for a hit to be
considered 3'-anchored. Compensates for BLAST clipping terminal mismatches.
max_amplicon_size: Maximum amplicon size (bp) to consider.
ontarget_tolerance: Coordinate tolerance (bp) for on-target classification.
blast_evalue: E-value passed to blastn via -evalue to control search sensitivity.
blast_word_size: Word size passed to blastn via -word_size.
blast_reward: Match reward passed to blastn via -reward.
blast_penalty: Mismatch penalty passed to blastn via -penalty.
"""
logger.info("Starting specificity check (BLAST)...")
os.makedirs(work_dir, exist_ok=True)
Expand All @@ -53,19 +66,20 @@ def run_specificity_check(
panel.save_candidate_primers_to_fasta(input_fasta)

# 3. Run BLAST
blast_archive = os.path.join(work_dir, "blast_archive")
blast_table = os.path.join(work_dir, "blast_table.txt")

runner = BlastRunner(input_fasta, genome_fasta)
runner.create_database()
try:
runner.run(output_archive=blast_archive, num_threads=num_threads)
runner.reformat_output_as_table(blast_table)
finally:
if os.path.exists(blast_archive):
os.remove(blast_archive)
logger.debug(f"Removed BLAST archive: {blast_archive}")

runner.run(
output_table=blast_table,
num_threads=num_threads,
evalue=blast_evalue,
word_size=blast_word_size,
reward=blast_reward,
penalty=blast_penalty,
max_hsps=blast_max_hsps,
dust=blast_dust,
)
blast_df = runner.get_dataframe()

if blast_df.empty:
Expand All @@ -79,6 +93,7 @@ def run_specificity_check(
length_threshold=length_threshold,
evalue_threshold=evalue_threshold,
max_mismatches=max_mismatches,
three_prime_tolerance=three_prime_tolerance,
)
annotator.add_annotations()

Expand Down
Loading