Skip to content

issue/394 - feat: support flash-attn via MooreThreads/mate for moore gpu#395

Open
spike-zhu wants to merge 1 commit into
mainfrom
issue/394
Open

issue/394 - feat: support flash-attn via MooreThreads/mate for moore gpu#395
spike-zhu wants to merge 1 commit into
mainfrom
issue/394

Conversation

@spike-zhu
Copy link
Copy Markdown
Collaborator

@spike-zhu spike-zhu commented May 21, 2026

摩尔 flash-attn 的支持,依赖开源摩尔 mate(https://github.com/MooreThreads/mate)v0.1.3 版本,同时完善了 warmup 的逻辑。

以 9g8b 推理 bs=4,input_len=256, output_len=256 为例:
prefill : 7000 token/s ->12500 tokens/s
decode: 115 tokens.s -> 165 tokens/s

image

@spike-zhu spike-zhu requested a review from a team May 21, 2026 08:25
@spike-zhu spike-zhu self-assigned this May 21, 2026
max_position_embeddings_,
max_position_embeddings_,
actual_max_q,
actual_max_k,
Copy link
Copy Markdown
Collaborator

@pengcheng888 pengcheng888 May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改后, flash的graph的能正常说话么. 请补充 --enable-graph的测试命令和测试截图

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改后, flash的graph的能正常说话么. 请补充 --enable-graph的测试命令和测试截图

相关实现已调整到 InfiniCore 中摩尔相关代码,不影响 InfiniLM

Comment thread examples/bench.py
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是顺手加了支持paged的warmup是么?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个改动需要解释一下,不加这个会发生什么?
另外这里确实有个bug,回头可以跟老马商量一下
@ma-hang

@wooway777 wooway777 requested a review from ma-hang May 22, 2026 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants