Skip to content

AssignGenes.py igblast --format airr: problem of translation #49

@gael-millot

Description

@gael-millot

In our hands, AssignGenes.py igblast --format airr using this fasta sequence

O1C7_VH_PBel118_n2
NNNNGGTGNNNNNNNNTGANNTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCGCGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGAATCACCATCAGTGGCCGCTACATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTGATAGTAGTGTTAGAACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCGAGAACTCACTGTATCTGCAGATGAACGGCCTGAGAGCCGAAGACACGGCCGTGTATCACTGTGCGATAGGGATGTATAACGGTGGCTGGGACTTTTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAGCCTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCGCCCTGCNNCCAGGAGCACNTCCGAGAGCNNNGCNCNNNNNNN

returns:
sequence_id sequence sequence_aa locus stop_codon vj_in_frame v_frameshift productive rev_comp complete_vdj d_frame v_call d_call j_call c_call sequence_alignment germline_alignment sequence_alignment_aa germline_alignment_aa v_alignment_start v_alignment_end d_alignment_start d_alignment_end j_alignment_start j_alignment_end c_alignment_start c_alignment_end v_sequence_alignment v_sequence_alignment_aa v_germline_alignment v_germline_alignment_aa d_sequence_alignment d_sequence_alignment_aa d_germline_alignment d_germline_alignment_aa j_sequence_alignment j_sequence_alignment_aa j_germline_alignment j_germline_alignment_aa c_sequence_alignment c_sequence_alignment_aa c_germline_alignment c_germline_alignment_aa fwr1 fwr1_aa cdr1 cdr1_aa fwr2 fwr2_aa cdr2 cdr2_aa fwr3 fwr3_aa fwr4 fwr4_aa cdr3 cdr3_aa junction junction_length junction_aa junction_aa_length v_score d_score j_score c_score v_cigar d_cigar j_cigar c_cigar v_support d_support j_support c_support v_identity d_identity j_identity c_identity v_sequence_start v_sequence_end v_germline_start v_germline_end d_sequence_start d_sequence_end d_germline_start d_germline_end j_sequence_start j_sequence_end j_germline_start j_germline_end c_sequence_start c_sequence_end c_germline_start c_germline_end fwr1_start fwr1_end cdr1_start cdr1_end fwr2_start fwr2_end cdr2_start cdr2_end fwr3_start fwr3_end fwr4_start fwr4_end cdr3_start cdr3_end np1 np1_length np2 np2_length
O1C7_VH_PBel118_n2 NNNNGGTGNNNNNNNNTGANNTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCGCGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGAATCACCATCAGTGGCCGCTACATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTGATAGTAGTGTTAGAACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCGAGAACTCACTGTATCTGCAGATGAACGGCCTGAGAGCCGAAGACACGGCCGTGTATCACTGTGCGATAGGGATGTATAACGGTGGCTGGGACTTTTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAGCCTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCGCCCTGCNNCCAGGAGCACNTCCGAGAGCNNNGCNCNNNNNNN QLVESGGGLVAPGGSLRLSCAASGITISGRYMSWFRQAPGKGLEWVSYIDSSVRTIYYADSVKGRFTISRDNAENSLYLQMNGLRAEDTAVYHCAIGMYNGGWDFWGQGTLVTVSSASTKGPSVFPLAPCXQEHXREXXXXX IGH FALSE TRUE FALSE TRUE TRUE FALSE NA IGHV3-1101 IGHD6-1901 IGHJ402,IGHJ502 IGHG201,IGHG202 TGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCGCGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGAATCACCATCAGTGGCCGCTACATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTGATAGTAGTGTTAGAACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCGAGAACTCACTGTATCTGCAGATGAACGGCCTGAGAGCCGAAGACACGGCCGTGTATCACTGTGCGATAGGGATGTATAACGGTGGCTGGGACTTTTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG TGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCAAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTGACTACTACATGAGCTGGATCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTGGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGNNNNGTATAGCAGTGGCTGGNNNNNNTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG QLVESGGGLVAPGGSLRLSCAASGITISGRYMSWFRQAPGKGLEWVSYIDSSVRTIYYADSVKGRFTISRDNAENSLYLQMNGLRAEDTAVYHCAIGMYNGGWDFWGQGTLVTVSS QLVESGGGLVKPGGSLRLSCAASGFTFSDYYMSWIRQAPGKGLEWVSYISSSGSTIYYADSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARXXYSSGWXXWGQGTLVTVSS 1 291 296 311 318 351 351 414 TGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCGCGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGAATCACCATCAGTGGCCGCTACATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTGATAGTAGTGTTAGAACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCGAGAACTCACTGTATCTGCAGATGAACGGCCTGAGAGCCGAAGACACGGCCGTGTATCACTGTGCGATAG QLVESGGGLVAPGGSLRLSCAASGITISGRYMSWFRQAPGKGLEWVSYIDSSVRTIYYADSVKGRFTISRDNAENSLYLQMNGLRAEDTAVYHCAI TGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCAAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTGACTACTACATGAGCTGGATCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTGGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAG QLVESGGGLVKPGGSLRLSCAASGFTFSDYYMSWIRQAPGKGLEWVSYISSSGSTIYYADSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCAR GTATAACGGTGGCTGG YNGGW GTATAGCAGTGGCTGG YSSGW TGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG WGQGTLVTVSS TGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG WGQGTLVTVSS GCCTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCGCCCTGCNNCCAGGAGCACNTCCGAGAGC ASTKGPSVFPLAPCXQEHXRE GCCTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCGCCCTGC-TCCAGGAGCACCTCCGAGAGC ASTKGPSVFPLAPCSRSTSES TGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCGCGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCT QLVESGGGLVAPGGSLRLSCAAS GGAATCACCATCAGTGGCCGCTAC GITISGRY ATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATAC MSWFRQAPGKGLEWVSY ATTGATAGTAGTGTTAGAACCATA IDSSVRTI TACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCGAGAACTCACTGTATCTGCAGATGAACGGCCTGAGAGCCGAAGACACGGCCGTGTATCACTGT YYADSVKGRFTISRDNAENSLYLQMNGLRAEDTAVYHC TGGGGCCAGGGAACCCTGGTCACCGTCTCCTCA WGQGTLVTVSS GCGATAGGGATGTATAACGGTGGCTGGGACTTT AIGMYNGGWDF TGTGCGATAGGGATGTATAACGGTGGCTGGGACTTTTGG 39 CAIGMYNGGWDFW 13 399.155 19.914 66.059 99.611 21S4N291M137S1N 316S2N16M117S3N 338S14N34M77S 371S42M1I21M14S915N 5.34e-113 0.307 6.657e-15 2.464e-23 93.814 87.5 100 95.312 22 312 5 295 317 332 3 18 339 372 15 48 372 435 1 63 22 92 93 116 117 167 168 191 192 305 339 371 306 338 GGAT 4 GACTTT 6

The problem is that the sequence in the sequence_alignment_aa column
QLVESGGGLVAPGGSLRLSCAASGITISGRYMSWFRQAPGKGLEWVSYIDSSVRTIYYADSVKGRFTISRDNAENSLYLQMNGLRAEDTAVYHCAIGMYNGGWDFWGQGTLVTVSS
is not the translation of the sequence in the column
TGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCGCGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGAATCACCATCAGTGGCCGCTACATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTGATAGTAGTGTTAGAACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCGAGAACTCACTGTATCTGCAGATGAACGGCCTGAGAGCCGAAGACACGGCCGTGTATCACTGTGCGATAGGGATGTATAACGGTGGCTGGGACTTTTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG

The two first nuc are removed before translation.

This is misleading.

Thanks for help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions