Hello I have been reading your paper and using it as a baseline to develop specific editing algorithms for specific tasks. Fantastic paper btw
I have noticed that in #22 the issue of inconsistency between the description and implementation of mutual self attention, specifically in regards to the value vectors, was mentioned.
|
qu=torch.cat([qu[:num_heads],qu[:num_heads],qu[:num_heads]]) |
|
qc=torch.cat([qc[:num_heads],qc[:num_heads],qc[:num_heads]]) |
|
ku=torch.cat([ku[:num_heads],ku[:num_heads],ku[:num_heads]]) |
|
kc=torch.cat([kc[:num_heads],kc[:num_heads],kc[:num_heads]]) |
|
vu=torch.cat([vu[:num_heads*2],vu[:num_heads]]) |
|
vc=torch.cat([vc[:num_heads*2],vc[:num_heads]]) |
However, unless I'm mistaken, it is not adressed or updated
assuming the attention vectors are split into [src-tgt-layout] wouldn't a more correct implementation of algorithm 3, during the self-edit step, be `
qu=torch.cat([qu[:num_heads],qu[:num_heads],qu[:num_heads]])
qc=torch.cat([qc[:num_heads],qc[:num_heads],qc[:num_heads]])
ku=torch.cat([ku[:num_heads],ku[:num_heads],ku[:num_heads]])
kc=torch.cat([kc[:num_heads],kc[:num_heads],kc[:num_heads]])
vu=torch.cat([vu[:num_heads],vu[:num_heads],vc[:num_heads]])
vc=torch.cat([vc[:num_heads],vc[:num_heads],vc[:num_heads]])`
Can you please clarify my confusion ?
Thank you
Hello I have been reading your paper and using it as a baseline to develop specific editing algorithms for specific tasks. Fantastic paper btw
I have noticed that in #22 the issue of inconsistency between the description and implementation of mutual self attention, specifically in regards to the value vectors, was mentioned.
InfEdit/app_infedit.py
Lines 246 to 251 in eaac91e
However, unless I'm mistaken, it is not adressed or updated
assuming the attention vectors are split into [src-tgt-layout] wouldn't a more correct implementation of algorithm 3, during the self-edit step, be `
Can you please clarify my confusion ?
Thank you