Skip to content

Allow vcf-expression-annotator to work on multiple samples or work on a VCF that has previously been annotated #82

@wanqiangdehuoguo

Description

@wanqiangdehuoguo

For example,

$ head genetpm.tsv

  GeneID          N190533 T190533
ENSG00000000003  0.743   12.1  
ENSG00000000005  0.0232   0.115
ENSG00000000419 46.4     43.4  
ENSG00000000457  5.22     6.26 
ENSG00000000460  9.80     4.45 
ENSG00000000938  2.22    31.5  

190533.vep.vcf is a mulit sample vcf for somatic mutation:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  N190533 T190533
chr1    1041944 .       C       G       .       PASS AS_FilterStatus=SITE;AS_SB_TABLE=209,110|41,22;DP=397;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=229,220;MMQ=60,60;MPOS=37;NALOD=2.01;NLOD=30.01;POPAF=6.00;ROQ=93;TLOD=136.09;CSQ=G|splice_polypyrimidine_tract_variant&intron_variant|LOW|AGRN|ENSG00000188157|Transcript|ENST00000379370.7|protein_coding||6/35|ENST00000379370.7:c.1178-12C>G|||||||||1||HGNC|HGNC:329|1|||MAGR......VVVGRHPLHLLEDAVTKPELRPCPTP      GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/0:142,0:9.708e-03:142:48,0:47,0:100,0:90,52,0,0       0/1:177,63:0.26 2:240:61,21:67,23:134,47:119,58,41,22

The error is:

$ vcf-expression-annotator --sample-name N190533 --id-column GeneID --expression-column N190533 --output-vcf 190533.vep1.vcf 190533.vep.vcf genetpm.tsv custom gene
WARNING:root:69 of 1300 genes did not have an expression entry for their gene id.
$ vcf-expression-annotator --sample-name T190533 --id-column GeneID --expression-column T190533 --output-vcf 190533.vep2.vcf 190533.vep1.vcf genetpm.tsv custom gene
Traceback (most recent call last):
  File "/home/sym/.conda/envs/pVACtools/bin/vcf-expression-annotator", line 8, in <module>
    sys.exit(main())
  File "/home/sym/.conda/envs/pVACtools/lib/python3.10/site-packages/vatools/vcf_expression_annotator.py", line 191, in main
    (vcf_reader, is_multi_sample) = create_vcf_reader(args)
  File "/home/sym/.conda/envs/pVACtools/lib/python3.10/site-packages/vatools/vcf_expression_annotator.py", line 90, in create_vcf_reader
    raise Exception("ERROR: VCF {} is already gene expression annotated. GX format header already exists.".format(args.input_vcf))
Exception: ERROR: VCF 190533.vep1.vcf is already gene expression annotated. GX format header already exists.

And same error is

$ vcf-expression-annotator --sample-name N190533,T190533 --id-column GeneID --expression-column N190533,T190533 --output-vcf 190533.vep1.vcf 190533.vep.vcf genetpm.tsv custom gene
Traceback (most recent call last):
  File "/home/sym/.conda/envs/pVACtools/bin/vcf-expression-annotator", line 8, in <module>
    sys.exit(main())
  File "/home/sym/.conda/envs/pVACtools/lib/python3.10/site-packages/vatools/vcf_expression_annotator.py", line 191, in main
    (vcf_reader, is_multi_sample) = create_vcf_reader(args)
  File "/home/sym/.conda/envs/pVACtools/lib/python3.10/site-packages/vatools/vcf_expression_annotator.py", line 84, in create_vcf_reader
    raise Exception("ERROR: VCF {} does not contain a sample column for sample {}.".format(args.input_vcf, args.sample_name))
Exception: ERROR: VCF 190533.vep.vcf does not contain a sample column for sample N190533,T190533.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions