MiXeR-SV: Structural Variant Enrichment Analysis with PyTorch Optimization

Version 0.0.1 — univariate SV enrichment analysis; intergration with older MiXeR features (e.g., bivariate analysis) planned for future release.

1. Overview

MiXeR-SV is a statistical genetics tool for quantifying the enrichment of structural variants (SVs) in trait heritability. It builds on the MiXeR framework and uses PyTorch for efficient gradient-based optimization. Key features include:

Univariate SV enrichment analysis via LD-score regression and a Gaussian mixture heritability model.
Annotation-based enrichment estimation with fold-enrichment statistics and standard errors.
QC of GWAS summary statistics (MAF filtering, MHC exclusion, ambiguous SNP removal, allele alignment).
Chromosome-level data processing for memory efficiency.
Bivariate (cross-trait) analysis planned for a future release.

2. Installation

The recommended way to set up the environment is via conda:

# Create a new conda environment with Python 3.13
conda create -n mixer2 python=3.13 -y

# Activate the environment
conda activate mixer2

# Install all required packages
pip install -r requirements.txt

Key dependencies (see requirements.txt for pinned versions):

Package	Purpose
`torch`	PyTorch optimization backend
`numpy` / `scipy`	Numerical computing
`pandas`	Data handling
`scikit-learn`	Utility functions
`mat73`	Loading MATLAB v7.3 LD matrix files

3. LD Reference Data

MiXeR-SV requires pre-computed LD matrices. We provide LD reference data derived from the 1000 Genomes Project (GRCh38) using both WGS 30X and ONT long-read sequencing:

Download link:

https://drive.google.com/drive/folders/15sg6P0rmHxRBQNglpFkDjoUSf1C6S-Q8?usp=sharing

You can download the two tar.gz files for the EAS and EUR populations, then extract them:

## EAS
tar -xzf 1kg_combined_plink_eas_AC3.tar.gz
## EUR
tar -xzf 1kg_combined_plink_eur_AC3.tar.gz

The directory structure should look like:

1kg_combined_plink_eur_AC3/
├── SV_variants.txt          # List of SV variant IDs
├── annot_mat.txt            # Annotation matrix (tab-separated, SNP-indexed)
├── chr1.ldmat/
├── chr2.ldmat/
│   ...
└── chr22.ldmat/

A permanent Zenodo DOI will be provided upon completion of peer review.

4. Summary Statistics

MiXeR-SV reads GWAS summary statistics in the standard format used by LDSC. A curated collection of 107 independent GWAS summary statistics is available from:

https://zenodo.org/records/10515792/files/sumstats_indep107.tgz?download=1

A few example files are included in the sumstats/ directory for quick testing.

5. Usage

Computational requirements

Hardware	Estimated runtime per trait
Apple M4 Max (36 GB RAM)	< 10 minutes
Intel CPU (32 GB+ RAM)	~30 mins – 1 hours

At least 32 GB of RAM is recommended for genome-wide analysis.

Running analysis

# Activate the conda environment
conda activate mixer2

# Create output directory
mkdir -p test_results

# Set paths to LD reference data (replace with your actual path)
LD_DIR="1kg_combined_plink_eur_AC3"

# other annotation files, should be in the same directory as the provided LD matrices
SNP_FILE="$LD_DIR/SV_variants.txt"
ANNOT="$LD_DIR/annot_mat.txt"

# Run analysis for all traits in sumstats/
for TRAIT in sumstats/*.sumstats.gz; do
    BASENAME=$(basename "$TRAIT" .sumstats.gz)
    OUT="test_results/univar_${BASENAME}.txt"
    OUT_LIST="test_results/univar_${BASENAME}_list.txt"

    # Skip if output already exists
    if [[ -f "$OUT_LIST" && -s "$OUT_LIST" ]]; then
        echo "Skipping (already exists): $OUT_LIST"
        continue
    fi

    echo "Processing: $BASENAME"

    python mixer2.py univar \
        --annot    "$ANNOT"    \
        --ld-mat1  "$LD_DIR"   \
        --trait1   "$TRAIT"    \
        --snp-file "$SNP_FILE" \
        --output   "$OUT"      \
        --seed 42
done

Running the constrained model (null SV enrichment)

The constrained model fixes the within-region heritability coefficient to zero, effectively testing the null hypothesis that SVs contribute no additional heritability beyond the genome-wide baseline. This is useful for:

Add --constrain-roi-estimates-to-zero to any univar call:

# Activate the conda environment
conda activate mixer2

# Create output directory
mkdir -p test_results_constrained

# Set paths to LD reference data
LD_DIR="1kg_combined_plink_eur_AC3"
SNP_FILE="$LD_DIR/SV_variants.txt"
ANNOT="$LD_DIR/annot_mat.txt"

# Run constrained analysis for all traits in sumstats/
for TRAIT in sumstats/*.sumstats.gz; do
    BASENAME=$(basename "$TRAIT" .sumstats.gz)
    OUT="test_results_constrained/univar_${BASENAME}.txt"
    OUT_LIST="test_results_constrained/univar_${BASENAME}_list.txt"

    # Skip if output already exists
    if [[ -f "$OUT_LIST" && -s "$OUT_LIST" ]]; then
        echo "Skipping (already exists): $OUT_LIST"
        continue
    fi

    echo "Processing (constrained): $BASENAME"

    python mixer2.py univar \
        --annot    "$ANNOT"    \
        --ld-mat1  "$LD_DIR"   \
        --trait1   "$TRAIT"    \
        --snp-file "$SNP_FILE" \
        --output   "$OUT"      \
        --constrain-roi-estimates-to-zero \
        --seed 42
done

6. CLI Arguments

All arguments below apply to the univar subcommand. Run python mixer2.py univar --help for the full help message.

Required arguments

Argument	Description
`--annot`	Path to the annotation matrix file (tab-separated, SNP-indexed)
`--ld-mat1`	Path to the LD reference directory containing `chr1.ldmat/` through `chr22.ldmat/` subdirectories
`--trait1`	Path to the GWAS summary statistics file (`.sumstats.gz`)

Optional arguments

Argument	Default	Description
`--snp-file`	`None`	File listing SNP IDs that define the genomic region of interest (e.g., SV loci)
`--output`, `-o`	`output.txt`	Base path for output files (`_list.txt`, `_table.txt`, `.log` suffixes are added automatically)
`--maf-threshold`	`0.05`	Minor allele frequency threshold for variant filtering
`--seed`	`None`	Random seed for reproducibility
`--s-value`	`-0.25`	Heritability model parameter S (variants contribute via H^S)
`--pytorch-epochs`	`500`	Number of PyTorch optimization epochs
`--pytorch-lr`	`0.001`	PyTorch optimizer learning rate
`--only-base`	`False`	Use only the "base" (intercept) annotation column
`--disable-inverse-ld-score-weights`	`False`	Disable inverse-LD-score weighting
`--save-null-model`	`False`	Cache the null (baseline) model to a `.sig2_beta_i.mat` file for reuse across runs
`--constrain-roi-estimates-to-zero`	`False`	Constrain the within-region (SV) heritability coefficient to zero; useful as a null/constrained model for comparison against the unconstrained fit
`--verbose`, `-v`	`False`	Enable verbose logging
`--debug`	`False`	Enable debug-level logging

7. Output Files

For a given --output results/univar_trait.txt, MiXeR-SV produces three files:

File	Description
`results/univar_trait_list.txt`	Key-value pairs of all result metrics, one per line
`results/univar_trait_table.txt`	Tab-separated table with column headers (suitable for downstream aggregation across traits)
`results/univar_trait.log`	Full run log

Result metrics include fold-enrichment estimates, standard errors, statistics, and per-annotation heritability contributions.

8. Citation

If you use MiXeR-SV in your research, please cite:

Citation details will be added upon publication.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
sumstats		sumstats
utils		utils
.gitignore		.gitignore
1.run_all_sv_saga_full.sh		1.run_all_sv_saga_full.sh
2.run_all_sv_saga_constrain.sh		2.run_all_sv_saga_constrain.sh
README.md		README.md
mixer2.py		mixer2.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MiXeR-SV: Structural Variant Enrichment Analysis with PyTorch Optimization

Table of Contents

1. Overview

2. Installation

3. LD Reference Data

4. Summary Statistics

5. Usage

Computational requirements

Running analysis

Running the constrained model (null SV enrichment)

6. CLI Arguments

Required arguments

Optional arguments

7. Output Files

8. Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

MiXeR-SV: Structural Variant Enrichment Analysis with PyTorch Optimization

Table of Contents

1. Overview

2. Installation

3. LD Reference Data

4. Summary Statistics

5. Usage

Computational requirements

Running analysis

Running the constrained model (null SV enrichment)

6. CLI Arguments

Required arguments

Optional arguments

7. Output Files

8. Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages