Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 23 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,12 +83,31 @@ leviosam2 lift -C source_to_target.clft -a aligned_to_source.bam -p lifted_from_

The levioSAM2 workflow includes lift-over using the `leviosam2-lift` kernel and a selective re-mapping strategy. This approach can improve accuracy.

Example:
We highly recommend using the **Python workflow (`leviosam2.py`)** as it is the newer, standard interface. The older Bash workflow (`leviosam2.sh`) is supported but not recommended.

**Primary Example (Python script):**

```shell
# You may skip the indexing step if you've already run it
# Index the chain file first (if not already done)
leviosam2 index -c source_to_target.chain -p source_to_target -F target.fai
sh leviosam2.sh \

# Run the Python workflow script
python workflow/leviosam2.py \
-i aligned_to_source.bam \
-o aligned_to_source-lifted \
-C source_to_target.clft \
-f target.fna \
-fi bt2/target \
-a bowtie2 \
-s ilmn_pe \
-t 16 \
--use_preset
```

**Alternative Example (Bash script - not recommended):**

```shell
sh workflow/leviosam2.sh \
-a bowtie2 -A -10 -q 10 -H 5 \
-i aligned_to_source.bam \
-o aligned_to_source-lifted \
Expand All @@ -98,7 +117,7 @@ sh leviosam2.sh \
-t 16
```

See [this README](https://github.com/milkschen/leviosam2/blob/main/workflow/README.md) to learn more about running the full levioSAM2 workflow.
See [workflow README](https://github.com/milkschen/leviosam2/blob/main/workflow/README.md) to learn more about running the full levioSAM2 workflow and how the parameter options differ between scripts and the binary.

## Publication

Expand Down
24 changes: 19 additions & 5 deletions src/leviosam.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -486,6 +486,8 @@ void print_lift_help_msg() {
"to be lifted. \n";
std::cerr << " "
"Leave empty or set to \"-\" to read from stdin.\n";
std::cerr << " -p string Prefix of the output files. [stdout]\n";
std::cerr << " -O string Format of the output file: sam or bam. [sam]\n";
std::cerr << " -t INT Number of threads used.\n"
" "
"If -t is not set, the value would be the sum of\n"
Expand All @@ -509,11 +511,21 @@ void print_lift_help_msg() {
"Add MD and NM to output alignment records (requires -f)\n";
std::cerr << " -f path "
"Path to the FASTA file of the target reference. \n";
std::cerr << " -F path "
"Path to the FAI (FASTA index) file of the target reference. \n";
std::cerr << " -x path Re-alignment preset. [] \n";
std::cerr << " -G INT "
"Maximum allowed gap size (in base pairs) between chain "
"intervals for an alignment to be considered liftable. A "
"value of 0 requires perfect chain interval continuity. [0]\n";
std::cerr << " -g INT "
"Haplotype (0 or 1) for variant-aware (VCF) mode. \n"
" "
"NOTE: Do not confuse with workflow script's `-g` (gap size).\n";
std::cerr << " -s string "
"Sample name for variant-aware (VCF) mode.\n";
std::cerr << " -v path "
"Path to the VCF file for variant-aware mode.\n";
std::cerr << " -T INT "
"Chunk size for each thread. [256] \n"
" " // align
Expand All @@ -522,12 +534,14 @@ void print_lift_help_msg() {
"Setting a larger -T uses slightly more "
"memory but might benefit thread scaling.\n";
std::cerr << "\n";
std::cerr << " Commit/defer rule options:\n";
std::cerr << " Commit/defer/suppress rule options:\n";
std::cerr << " -S string<:int/float> Key-value pair of "
"a split rule. We allow appending multiple `-S` options.\n";
std::cerr << " Options: "
"mapq:<int>, aln_score:<int>, isize:<int>, hdist:<int>, "
"clipped_frac:<float>. lifted. [none]\n";
"lifted, mapq:<int>, aln_score:<int>, isize:<int>, hdist:<int>, "
"clipped_frac:<float> [none]\n";
std::cerr << " * lifted "
"Defer alignments that are unlifted (unmapped) after liftover.\n";
std::cerr << " * mapq INT "
"Min MAPQ (pre-liftover) accepted for a committed read.\n";
std::cerr << " * aln_score INT "
Expand All @@ -549,10 +563,10 @@ void print_lift_help_msg() {
std::cerr << " Example: `-S mapq:20 -S aln_score:20` commits "
"MQ>=20 and AS>=20 alignments.\n";
std::cerr << " -r string Path to a BED file (source "
"coordinates). Reads overlap with the regions are always "
"coordinates). Reads overlapping with the regions are always "
"committed. [none]\n";
std::cerr << " -D string Path to a BED file (dest coordinates). "
"Reads overlap with the regions are always deferred. [none]\n";
"Reads overlapping with the regions are always deferred. [none]\n";
std::cerr << " -B float Threshold for BED record intersection. "
"[0]\n"
" If <= 0: consider any overlap (>0 bp)\n"
Expand Down
17 changes: 13 additions & 4 deletions workflow/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# LevioSAM2 workflow

> [!WARNING]
> **CLI Flag Differences between Workflows and the C++ Binary:**
> The flag names in these workflow scripts (`leviosam2.py` / `leviosam2.sh`) differ from those of the underlying C++ binary (`leviosam2 lift`). If you run the C++ binary directly, verify the option mappings:
> * **Gap size threshold:** Use `-g` in Python/Bash scripts, but **`-G`** in the C++ binary. (The C++ binary's `-g` is reserved for VCF haplotype).
> * **Sequence / Aligner / Sample flags:** `-s` specifies sequence type in Python, source index path in Bash, and VCF sample name in the C++ binary.
> * **Commit / Suppress annotations:** Use `--lift_bed_commit_source` (Python) or `-R` (Bash), but **`-r`** in the C++ binary.
> * **Defer rules (`-m` and `-p`):** In Bash, `-m` is the template size cutoff and `-p` is the clipped fraction cutoff. In the C++ binary, `-m` enables MD/NM tags, and `-p` is the **output prefix**.
> Check `leviosam2 lift -h` for details.

## Dependencies

All dependent software is included in our Docker/Singularity container.
Expand Down Expand Up @@ -30,7 +39,7 @@ python leviosam2.py \
--lift_bed_defer_target defer_annotations.bed # optional
```

With the bash worklow (older workflow to be deprecated):
Alternative with the Bash workflow (not recommended):

```shell
bash leviosam2.sh \
Expand Down Expand Up @@ -62,7 +71,7 @@ python leviosam2.py \
--lift_bed_defer_target defer_annotations.bed # optional
```

With the bash worklow (older workflow to be deprecated):
Alternative with the Bash workflow (not recommended):

```shell
bash leviosam2.sh \
Expand Down Expand Up @@ -102,7 +111,7 @@ python leviosam2.py \
--lift_bed_commit_source suppress_annotations.bed # optional
```

With the bash worklow (older workflow to be deprecated):
Alternative with the Bash workflow (not recommended):

```shell
bash leviosam2.sh \
Expand Down Expand Up @@ -132,7 +141,7 @@ python leviosam2.py \
--lift_bed_commit_source suppress_annotations.bed # optional
```

With the bash worklow (older workflow to be deprecated):
Alternative with the Bash workflow (not recommended):

```shell
bash leviosam2.sh \
Expand Down
2 changes: 1 addition & 1 deletion workflow/leviosam2.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ def parse_args() -> argparse.Namespace:
"--lift_bed_defer_target",
type=str,
help=(
"[lift] Path to a BED (target cooridnates)"
"[lift] Path to a BED (target coordinates) "
"where reads in the regions are always "
"deferred"
),
Expand Down
2 changes: 1 addition & 1 deletion workflow/leviosam2.sh
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ function print_usage_and_exit {
echo " -w path Path to the input FASTQ (read 1) []"
echo " -W path Path to the input FASTQ (read 2, optional) []"
echo " LevioSAM2-lift:"
echo " -g INT Number of gaps allowed during leviosam2-lift [0]"
echo " -g INT Max chain gap size allowed (in base pairs) during leviosam2-lift [0]"
echo " -x path Path to the levioSAM2 re-alignment config YAML []"
echo " Commit/Defer/Suppress rules:"
echo " -A INT Alignment score cutoff for the defer rule []"
Expand Down
Loading