Skip to content

PfeiferLab/rhesus_SV

Repository files navigation

Evolutionary genomics based on PacBio HiFi long-read sequencing data reveals the importance of structural variants in shaping population-specific differences between Chinese and Indian rhesus macaques (Macaca mulatta)

Set-up: Make the Python scripts executable and compile the C++ programs

chmod u+x count_genotypes_ver_4.py
chmod u+x count_genotypes_ver_5.py
chmod u+x find_private_sv_ver_2.py
g++ -o bonferroni_vaf_bin_ver_2 bonferroni_vaf_bin_ver_2.cpp -lm
g++ -o find_significant_private_sv_ver_2 find_significant_private_sv_ver_2.cpp

Calculate population-specific allele frequencies

  1. Prepare a tab-delimited .txt from the .vcf that contains basic information of each SVs (including chromosome, position, ID, SV type, SV length, and the genotype of each individual at the site)

    bcftools query -H -f '%CHROM\t%POS\t%ID\t%SVTYPE\t%SVLEN[\t%GT]\n' rhesus.SVs.vcf.gz -o rhesus.SVs.txt
    

    and then separate by population

    sed 's/#//' rhesus.SVs.txt | cut -f 1-15 > rhesus.SVs.Chi.txt
    sed 's/#//' rhesus.SVs.txt | cut -f 1-5,16- > rhesus.SVs.Ind.txt
    
  2. Calculate the reference, alternative, and minor allele frequencies in both populations

    python3 ./count_genotypes_ver_4.py rhesus.SVs.Chi.txt rhesus.SVs.Chi.AF.txt
    python3 ./count_genotypes_ver_4.py rhesus.SVs.Ind.txt rhesus.SVs.Ind.AF.txt
    

Identify significant population-private SVs

  1. Calculate the alternative allele count and frequency of each SV in each population

    python3 count_genotypes_ver_5.py rhesus.SVs.Chi.txt rhesus.SVs.Chi.AC.AF.txt
    python3 count_genotypes_ver_5.py rhesus.SVs.Ind.txt rhesus.SVs.Ind.AC.AF.txt
    
  2. Merge the population-specific alternative allele counts and frequencies of each SV and add header information

    paste <(cut -f 1-8 rhesus.SVs.Chi.AC.AF.txt) <(cut -f 6-8 rhesus.SVs.Ind.AC.AF.txt) | sed '1d' > rhesus.SVs.AC.AF.txt
    { printf 'CHROM\tPOS\tID\tSVTYPE\tSVLEN\tChi_GENOS\tChi_VAC\tChi_VAF\tInd_GENOS\tInd_VAC\tInd_VAF\n'; cat rhesus.SVs.AC.AF.txt; } > tmp && mv tmp rhesus.SVs.AC.AF.txt
    
  3. Identify population-private SVs

    python3 find_private_sv_ver_2.py rhesus.SVs.AC.AF.txt rhesus.SVs.PopPriv.txt
    sed '1d' rhesus.SVs.PopPriv.txt | awk '{if ($6 == "Chi") print}' > rhesus.SVs.PopPriv.Chi.txt
    sed '1d' rhesus.SVs.PopPriv.txt | awk '{if ($6 == "Ind") print}' > rhesus.SVs.PopPriv.Ind.txt
    
  4. Apply a Bonferroni correction

    ./bonferroni_vaf_bin_ver_2 -in rhesus.SVs.PopPriv.Chi.txt -out rhesus.SVs.PopPriv.Chi.Bonferroni.txt
    ./bonferroni_vaf_bin_ver_2 -in rhesus.SVs.PopPriv.Ind.txt -out rhesus.SVs.PopPriv.Ind.Bonferroni.txt
    
  5. Identify significant population-private SVs

    ./find_significant_private_sv_ver_2 -in rhesus.SVs.PopPriv.Chi.txt -cv rhesus.SVs.PopPriv.Chi.Bonferroni.txt -out rhesus.SVs.PopPriv.Chi.Bonferroni.signifiant.txt
    ./find_significant_private_sv_ver_2 -in rhesus.SVs.PopPriv.Ind.txt -cv rhesus.SVs.PopPriv.Ind.Bonferroni.txt -out rhesus.SVs.PopPriv.Ind.Bonferroni.signifiant.txt
    

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors