In our hands, AssignGenes.py igblast --format airr using this fasta sequence
O1C7_VH_PBel118_n2
NNNNGGTGNNNNNNNNTGANNTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCGCGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGAATCACCATCAGTGGCCGCTACATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTGATAGTAGTGTTAGAACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCGAGAACTCACTGTATCTGCAGATGAACGGCCTGAGAGCCGAAGACACGGCCGTGTATCACTGTGCGATAGGGATGTATAACGGTGGCTGGGACTTTTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAGCCTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCGCCCTGCNNCCAGGAGCACNTCCGAGAGCNNNGCNCNNNNNNN
returns:
sequence_id sequence sequence_aa locus stop_codon vj_in_frame v_frameshift productive rev_comp complete_vdj d_frame v_call d_call j_call c_call sequence_alignment germline_alignment sequence_alignment_aa germline_alignment_aa v_alignment_start v_alignment_end d_alignment_start d_alignment_end j_alignment_start j_alignment_end c_alignment_start c_alignment_end v_sequence_alignment v_sequence_alignment_aa v_germline_alignment v_germline_alignment_aa d_sequence_alignment d_sequence_alignment_aa d_germline_alignment d_germline_alignment_aa j_sequence_alignment j_sequence_alignment_aa j_germline_alignment j_germline_alignment_aa c_sequence_alignment c_sequence_alignment_aa c_germline_alignment c_germline_alignment_aa fwr1 fwr1_aa cdr1 cdr1_aa fwr2 fwr2_aa cdr2 cdr2_aa fwr3 fwr3_aa fwr4 fwr4_aa cdr3 cdr3_aa junction junction_length junction_aa junction_aa_length v_score d_score j_score c_score v_cigar d_cigar j_cigar c_cigar v_support d_support j_support c_support v_identity d_identity j_identity c_identity v_sequence_start v_sequence_end v_germline_start v_germline_end d_sequence_start d_sequence_end d_germline_start d_germline_end j_sequence_start j_sequence_end j_germline_start j_germline_end c_sequence_start c_sequence_end c_germline_start c_germline_end fwr1_start fwr1_end cdr1_start cdr1_end fwr2_start fwr2_end cdr2_start cdr2_end fwr3_start fwr3_end fwr4_start fwr4_end cdr3_start cdr3_end np1 np1_length np2 np2_length
O1C7_VH_PBel118_n2 NNNNGGTGNNNNNNNNTGANNTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCGCGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGAATCACCATCAGTGGCCGCTACATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTGATAGTAGTGTTAGAACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCGAGAACTCACTGTATCTGCAGATGAACGGCCTGAGAGCCGAAGACACGGCCGTGTATCACTGTGCGATAGGGATGTATAACGGTGGCTGGGACTTTTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAGCCTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCGCCCTGCNNCCAGGAGCACNTCCGAGAGCNNNGCNCNNNNNNN QLVESGGGLVAPGGSLRLSCAASGITISGRYMSWFRQAPGKGLEWVSYIDSSVRTIYYADSVKGRFTISRDNAENSLYLQMNGLRAEDTAVYHCAIGMYNGGWDFWGQGTLVTVSSASTKGPSVFPLAPCXQEHXREXXXXX IGH FALSE TRUE FALSE TRUE TRUE FALSE NA IGHV3-1101 IGHD6-1901 IGHJ402,IGHJ502 IGHG201,IGHG202 TGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCGCGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGAATCACCATCAGTGGCCGCTACATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTGATAGTAGTGTTAGAACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCGAGAACTCACTGTATCTGCAGATGAACGGCCTGAGAGCCGAAGACACGGCCGTGTATCACTGTGCGATAGGGATGTATAACGGTGGCTGGGACTTTTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG TGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCAAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTGACTACTACATGAGCTGGATCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTGGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGNNNNGTATAGCAGTGGCTGGNNNNNNTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG QLVESGGGLVAPGGSLRLSCAASGITISGRYMSWFRQAPGKGLEWVSYIDSSVRTIYYADSVKGRFTISRDNAENSLYLQMNGLRAEDTAVYHCAIGMYNGGWDFWGQGTLVTVSS QLVESGGGLVKPGGSLRLSCAASGFTFSDYYMSWIRQAPGKGLEWVSYISSSGSTIYYADSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARXXYSSGWXXWGQGTLVTVSS 1 291 296 311 318 351 351 414 TGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCGCGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGAATCACCATCAGTGGCCGCTACATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTGATAGTAGTGTTAGAACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCGAGAACTCACTGTATCTGCAGATGAACGGCCTGAGAGCCGAAGACACGGCCGTGTATCACTGTGCGATAG QLVESGGGLVAPGGSLRLSCAASGITISGRYMSWFRQAPGKGLEWVSYIDSSVRTIYYADSVKGRFTISRDNAENSLYLQMNGLRAEDTAVYHCAI TGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCAAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTGACTACTACATGAGCTGGATCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTGGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAG QLVESGGGLVKPGGSLRLSCAASGFTFSDYYMSWIRQAPGKGLEWVSYISSSGSTIYYADSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCAR GTATAACGGTGGCTGG YNGGW GTATAGCAGTGGCTGG YSSGW TGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG WGQGTLVTVSS TGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG WGQGTLVTVSS GCCTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCGCCCTGCNNCCAGGAGCACNTCCGAGAGC ASTKGPSVFPLAPCXQEHXRE GCCTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCGCCCTGC-TCCAGGAGCACCTCCGAGAGC ASTKGPSVFPLAPCSRSTSES TGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCGCGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCT QLVESGGGLVAPGGSLRLSCAAS GGAATCACCATCAGTGGCCGCTAC GITISGRY ATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATAC MSWFRQAPGKGLEWVSY ATTGATAGTAGTGTTAGAACCATA IDSSVRTI TACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCGAGAACTCACTGTATCTGCAGATGAACGGCCTGAGAGCCGAAGACACGGCCGTGTATCACTGT YYADSVKGRFTISRDNAENSLYLQMNGLRAEDTAVYHC TGGGGCCAGGGAACCCTGGTCACCGTCTCCTCA WGQGTLVTVSS GCGATAGGGATGTATAACGGTGGCTGGGACTTT AIGMYNGGWDF TGTGCGATAGGGATGTATAACGGTGGCTGGGACTTTTGG 39 CAIGMYNGGWDFW 13 399.155 19.914 66.059 99.611 21S4N291M137S1N 316S2N16M117S3N 338S14N34M77S 371S42M1I21M14S915N 5.34e-113 0.307 6.657e-15 2.464e-23 93.814 87.5 100 95.312 22 312 5 295 317 332 3 18 339 372 15 48 372 435 1 63 22 92 93 116 117 167 168 191 192 305 339 371 306 338 GGAT 4 GACTTT 6
The problem is that the sequence in the sequence_alignment_aa column
QLVESGGGLVAPGGSLRLSCAASGITISGRYMSWFRQAPGKGLEWVSYIDSSVRTIYYADSVKGRFTISRDNAENSLYLQMNGLRAEDTAVYHCAIGMYNGGWDFWGQGTLVTVSS
is not the translation of the sequence in the column
TGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCGCGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGAATCACCATCAGTGGCCGCTACATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTGATAGTAGTGTTAGAACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCGAGAACTCACTGTATCTGCAGATGAACGGCCTGAGAGCCGAAGACACGGCCGTGTATCACTGTGCGATAGGGATGTATAACGGTGGCTGGGACTTTTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG
The two first nuc are removed before translation.
This is misleading.
Thanks for help.
In our hands,
AssignGenes.py igblast --format airrusing this fasta sequencereturns:
sequence_id sequence sequence_aa locus stop_codon vj_in_frame v_frameshift productive rev_comp complete_vdj d_frame v_call d_call j_call c_call sequence_alignment germline_alignment sequence_alignment_aa germline_alignment_aa v_alignment_start v_alignment_end d_alignment_start d_alignment_end j_alignment_start j_alignment_end c_alignment_start c_alignment_end v_sequence_alignment v_sequence_alignment_aa v_germline_alignment v_germline_alignment_aa d_sequence_alignment d_sequence_alignment_aa d_germline_alignment d_germline_alignment_aa j_sequence_alignment j_sequence_alignment_aa j_germline_alignment j_germline_alignment_aa c_sequence_alignment c_sequence_alignment_aa c_germline_alignment c_germline_alignment_aa fwr1 fwr1_aa cdr1 cdr1_aa fwr2 fwr2_aa cdr2 cdr2_aa fwr3 fwr3_aa fwr4 fwr4_aa cdr3 cdr3_aa junction junction_length junction_aa junction_aa_length v_score d_score j_score c_score v_cigar d_cigar j_cigar c_cigar v_support d_support j_support c_support v_identity d_identity j_identity c_identity v_sequence_start v_sequence_end v_germline_start v_germline_end d_sequence_start d_sequence_end d_germline_start d_germline_end j_sequence_start j_sequence_end j_germline_start j_germline_end c_sequence_start c_sequence_end c_germline_start c_germline_end fwr1_start fwr1_end cdr1_start cdr1_end fwr2_start fwr2_end cdr2_start cdr2_end fwr3_start fwr3_end fwr4_start fwr4_end cdr3_start cdr3_end np1 np1_length np2 np2_length
O1C7_VH_PBel118_n2 NNNNGGTGNNNNNNNNTGANNTGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCGCGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGAATCACCATCAGTGGCCGCTACATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTGATAGTAGTGTTAGAACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCGAGAACTCACTGTATCTGCAGATGAACGGCCTGAGAGCCGAAGACACGGCCGTGTATCACTGTGCGATAGGGATGTATAACGGTGGCTGGGACTTTTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAGCCTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCGCCCTGCNNCCAGGAGCACNTCCGAGAGCNNNGCNCNNNNNNN QLVESGGGLVAPGGSLRLSCAASGITISGRYMSWFRQAPGKGLEWVSYIDSSVRTIYYADSVKGRFTISRDNAENSLYLQMNGLRAEDTAVYHCAIGMYNGGWDFWGQGTLVTVSSASTKGPSVFPLAPCXQEHXREXXXXX IGH FALSE TRUE FALSE TRUE TRUE FALSE NA IGHV3-1101 IGHD6-1901 IGHJ402,IGHJ502 IGHG201,IGHG202 TGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCGCGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGAATCACCATCAGTGGCCGCTACATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTGATAGTAGTGTTAGAACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCGAGAACTCACTGTATCTGCAGATGAACGGCCTGAGAGCCGAAGACACGGCCGTGTATCACTGTGCGATAGGGATGTATAACGGTGGCTGGGACTTTTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG TGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCAAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTGACTACTACATGAGCTGGATCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTGGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAGNNNNGTATAGCAGTGGCTGGNNNNNNTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG QLVESGGGLVAPGGSLRLSCAASGITISGRYMSWFRQAPGKGLEWVSYIDSSVRTIYYADSVKGRFTISRDNAENSLYLQMNGLRAEDTAVYHCAIGMYNGGWDFWGQGTLVTVSS QLVESGGGLVKPGGSLRLSCAASGFTFSDYYMSWIRQAPGKGLEWVSYISSSGSTIYYADSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARXXYSSGWXXWGQGTLVTVSS 1 291 296 311 318 351 351 414 TGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCGCGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGAATCACCATCAGTGGCCGCTACATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTGATAGTAGTGTTAGAACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCGAGAACTCACTGTATCTGCAGATGAACGGCCTGAGAGCCGAAGACACGGCCGTGTATCACTGTGCGATAG QLVESGGGLVAPGGSLRLSCAASGITISGRYMSWFRQAPGKGLEWVSYIDSSVRTIYYADSVKGRFTISRDNAENSLYLQMNGLRAEDTAVYHCAI TGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCAAGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTGACTACTACATGAGCTGGATCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTAGTAGTAGTGGTAGTACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACGGCCGTGTATTACTGTGCGAGAG QLVESGGGLVKPGGSLRLSCAASGFTFSDYYMSWIRQAPGKGLEWVSYISSSGSTIYYADSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCAR GTATAACGGTGGCTGG YNGGW GTATAGCAGTGGCTGG YSSGW TGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG WGQGTLVTVSS TGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG WGQGTLVTVSS GCCTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCGCCCTGCNNCCAGGAGCACNTCCGAGAGC ASTKGPSVFPLAPCXQEHXRE GCCTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCGCCCTGC-TCCAGGAGCACCTCCGAGAGC ASTKGPSVFPLAPCSRSTSES TGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCGCGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCT QLVESGGGLVAPGGSLRLSCAAS GGAATCACCATCAGTGGCCGCTAC GITISGRY ATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATAC MSWFRQAPGKGLEWVSY ATTGATAGTAGTGTTAGAACCATA IDSSVRTI TACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCGAGAACTCACTGTATCTGCAGATGAACGGCCTGAGAGCCGAAGACACGGCCGTGTATCACTGT YYADSVKGRFTISRDNAENSLYLQMNGLRAEDTAVYHC TGGGGCCAGGGAACCCTGGTCACCGTCTCCTCA WGQGTLVTVSS GCGATAGGGATGTATAACGGTGGCTGGGACTTT AIGMYNGGWDF TGTGCGATAGGGATGTATAACGGTGGCTGGGACTTTTGG 39 CAIGMYNGGWDFW 13 399.155 19.914 66.059 99.611 21S4N291M137S1N 316S2N16M117S3N 338S14N34M77S 371S42M1I21M14S915N 5.34e-113 0.307 6.657e-15 2.464e-23 93.814 87.5 100 95.312 22 312 5 295 317 332 3 18 339 372 15 48 372 435 1 63 22 92 93 116 117 167 168 191 192 305 339 371 306 338 GGAT 4 GACTTT 6
The problem is that the sequence in the sequence_alignment_aa column
QLVESGGGLVAPGGSLRLSCAASGITISGRYMSWFRQAPGKGLEWVSYIDSSVRTIYYADSVKGRFTISRDNAENSLYLQMNGLRAEDTAVYHCAIGMYNGGWDFWGQGTLVTVSS
is not the translation of the sequence in the column
TGCAGCTGGTGGAGTCTGGGGGAGGCTTGGTCGCGCCTGGAGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGAATCACCATCAGTGGCCGCTACATGAGCTGGTTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTTTCATACATTGATAGTAGTGTTAGAACCATATACTACGCAGACTCTGTGAAGGGCCGATTCACCATCTCCAGGGACAACGCCGAGAACTCACTGTATCTGCAGATGAACGGCCTGAGAGCCGAAGACACGGCCGTGTATCACTGTGCGATAGGGATGTATAACGGTGGCTGGGACTTTTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG
The two first nuc are removed before translation.
This is misleading.
Thanks for help.