optimize the generation of attention mask by imh966 · Pull Request #331 · deepspeedai/Megatron-DeepSpeed

imh966 · 2024-01-13T05:59:57Z

Hi, I found that the attention mask tensor is created on cpu, leading to inefficient operations on attention mask and an extra H2D operation.

optimize the generation of attention mask

b3608c4

imh966 requested review from GuanhuaWang, ShadenSmith, arashb, awan-10, conglongli, duli2012, eltonzheng, minjiaz, mrwyattii, tjruwase and xiaoxiawu-microsoft as code owners January 13, 2024 05:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize the generation of attention mask#331

optimize the generation of attention mask#331
imh966 wants to merge 1 commit into
deepspeedai:mainfrom
imh966:fix_attention_mask

imh966 commented Jan 13, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

imh966 commented Jan 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

imh966 commented Jan 13, 2024 •

edited

Loading