Skip to content

hillerlab/make_lastz_chains

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

203 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Make Lastz Chains

made-with-Nextflow

Pairwise genome alignment chains. Inputs to TOGA and multiz.

Abstract Chains


Before you run

  • Softmask both genomes (lowercase, do NOT hardmask). RepeatModeler 2 per genome is recommended; add WindowMasker if you see runaway LASTZ runtimes.
  • Scaffold names: no spaces; avoid dots (rename NC_00000.1NC_00000). Restore originals with standalone_scripts/rename_chromosomes_back.py.
  • Inputs accepted: .fasta or .2bit.

1. Original Python pipeline (make_chains.py)

Click to expand
git clone https://github.com/hillerlab/make_lastz_chains.git
cd make_lastz_chains
mamba env create -f environment.yml
mamba activate make_lastz_chains

python make_chains.py \
    --project_dir    /path/to/output \
    --target_genome  /path/to/target.fa \
    --query_genome   /path/to/query.fa \
    --target_name    hg38 \
    --query_name     mm39

Or pass all parameters from a file:

python make_chains.py --params_from_file my_params.yaml

2. nf-core pipeline — local (Docker / Apptainer)

Click to expand

Requirements: Nextflow ≥ 25.04.6, Docker or Apptainer, Java.

git clone https://github.com/hillerlab/make_lastz_chains.git
cd make_lastz_chains

Edit params.json (set target_name, query_name, target_genome, query_genome), then:

# Docker
nextflow run main.nf -params-file params.json -profile docker

# Apptainer / Singularity
nextflow run main.nf -params-file params.json -profile apptainer

Build the image locally (optional):

docker buildx build --platform linux/amd64 -t nilablueshirt/make_lastz_chains:latest-amd64 .

Use a pre-built Apptainer SIF (optional):

apptainer build make_lastz_chains.sif docker://nilablueshirt/make_lastz_chains:latest-amd64
export NXF_CONTAINER_IMAGE=/path/to/make_lastz_chains.sif

Smoke test:

nextflow run main.nf -profile test,apptainer

3. nf-core pipeline — HPC (SLURM)

Click to expand

Requirements: Nextflow ≥ 25.04.6, Apptainer, Java, SLURM cluster.

git clone https://github.com/hillerlab/make_lastz_chains.git
cd make_lastz_chains

Edit the path variables at the top of run_nf_slurm_example.sh (cache dir, container image, manifest path), then submit:

sbatch --array=1-<N> run_nf_slurm_example.sh

Each array task spawns one Nextflow head job that submits all compute as child SLURM jobs.

LASTZ, AXT_CHAIN, and REPEAT_FILLER run as SLURM job arrays. Partition routing, array sizes, and resource tiers are documented inline in nextflow.config — edit there to match your cluster.


Checkpoint resumes

# Resume from failure
nextflow run main.nf -params-file params.json -profile apptainer -resume

# Restart from *.all.chain.gz
nextflow run main.nf -entry FROM_FILL_CHAINS -params-file params.json \
    --merged_chain       results/chain_merge/hg38.mm39.all.chain.gz \
    --target_twobit      results/genome_prep/target.2bit \
    --query_twobit       results/genome_prep/query.2bit \
    --target_chrom_sizes results/genome_prep/target.chrom.sizes \
    --query_chrom_sizes  results/genome_prep/query.chrom.sizes \
    -profile apptainer

# Restart from *.filled.chain.gz
nextflow run main.nf -entry FROM_CLEAN_CHAINS -params-file params.json \
    --filled_chain       results/fill_chains/hg38.mm39.filled.chain.gz \
    --target_twobit      results/genome_prep/target.2bit \
    --query_twobit       results/genome_prep/query.2bit \
    --target_chrom_sizes results/genome_prep/target.chrom.sizes \
    --query_chrom_sizes  results/genome_prep/query.chrom.sizes \
    -profile apptainer

For SLURM, add ,slurm to the -profile flag.


Output

results/
├── genome_prep/      target.2bit, query.2bit, *.chrom.sizes
├── partition/        *_partitions.txt
├── chain_merge/      *.all.chain.gz        ← checkpoint for FROM_FILL_CHAINS
├── fill_chains/      *.filled.chain.gz     ← checkpoint for FROM_CLEAN_CHAINS
├── final/            *.final.chain.gz      ← final output
└── pipeline_info/    timeline, trace, DAG

Where to edit

File What
params.json Genome paths, alignment settings — per run
nextflow.config Compute resources, profiles, container, SLURM — rarely
run_nf_slurm_example.sh SLURM submission wrapper for multi-pair runs

Design rationale and root-cause writeups: CHANGES_nfcore_refactor.md.


Citation

About

Portable solution to generate genome alignment chains using lastz

Topics

Resources

License

Stars

Watchers

Forks

Contributors