[TLE] [ENFLAME] support TLE features by baoqiliu · Pull Request #602 · flagos-ai/FlagTree

baoqiliu · 2026-05-20T10:22:23Z

[ENFLAME] Support TLE features for GCU backend
Enable TLE (Triton Language Extension) framework support on GCU backend,
including TLE-Raw MLIR EDSL, GCU Dialect EDSL (TOPS), compiler pipeline
integration, GCU300/GCU400C++ backend upgrades, and comprehensive
test coverage.

TLE-Lite现状

Python DSL	状态
tle.load(is_async=True)	✅已支持
tle.cumsum	✅已支持
tle.extract_tile	✅已支持
tle.insert_tile	✅已支持
tle.device_mesh	❌未支持
tle.shardingtle.S tle.Ptle.B	❌未支持
tle.make_sharded_tensortle.ShardedTensor	❌未支持
tle.shard_id	❌未支持
tle.distributed_barrier	❌未支持
tle.remote	❌未支持
tle.reshard	❌未支持
tle.distributed_dot	❌未支持

TLE-Struct (GPU) 现状

Python DSL	状态
tle.gpu.alloc	✅已支持
tle.gpu.local_ptr	✅已支持
tle.gpu.copy (TMA)	✅已支持
tle.gpu.memory_space	✅已支持
tle.gpu.pipeline	❌未支持

Test Result:
====================================== 4708 passed, 3482 skipped, 58 warnings in 2920.80s (0:48:40) =======================================

1. TLE Framework & GCU Dialect EDSL

1.1 TLE-Raw TOPS Runtime (new module: tle/raw/tops/) and Pass Support

- Add gcu_dialect.py: GCU MLIR dialect EDSL with native !gcu.ptr<T>
  type support, gcu.ptr2memref, vector.maskedload patterns
- Add mlir_runtime.py: MLIR-based runtime for GCU dialect IR execution
- Add runtime.py: TOPS C++ kernel integration runtime
- Extend codegen.py and runtime.py to support GCU backend dispatch
- Add gcu300.py, gcu400.py, mlir.py: typed Python wrappers for
  C++ MLIR passes via libtriton_gcu{300,400}_core
- Integrate TLE-raw pre-passes (TleToTritonGCU, ConvertArgToMemDesc,
  RemoveRedundantCopy, DSLRegionInline) with has_tle_raw detection
- Add gcu_intrinsics.py for GCU intrinsic function mapping
  Support IntrinsicsGCU/IntrinsicsTCLE data files for GCU400

1.2 TLE-Struct/Lite Core Pass Support

- Support TleToTritonGCU.cpp: TLE-to-TritonGCU conversion
  Phase 1 - Barrier Insertion
  Phase 2 - Lower TLE ops to TritonGCU ops (tle.local_pointers, ttg.tma_copy, tle.extract_ptr)

2. GCU Backend

2.1 Conversion Passes (Optimized)

- Rewrite ReduceOpToGCU.cpp: improved reduce lowering
- Rewrite ScanOpToGCU.cpp: expanded scan support
- Enhance TTSmemLowerToGCU.cpp: shared memory lowering
- Rework TritonToGCU.cpp: main conversion pipeline
- Add TritonGCULocalMemOptimize.cpp: local memory optimization pass
- Add ReduceScanCommon: shared reduce/scan utilities
- Add Vectorize.cpp/h: vectorization support
- Add TritonToTritonGPU conversion for GCU400
- Enhance PtrAnalysis, MaskAnalysis, FirstLastUserAnalysis

2.2 Python Bindings (new)

- Add triton_gcu400_core.cpp/h: C++ pass registration module
- Add triton_gcu400_module.cpp: Python pybind11 bindings
- Export pass functions via triton_gcu400_core.map

3. Tests

3.1 TLE-Raw Tests (Enflame, 13 new files)

- MLIR EDSL: vector-add, softmax, gcu_opt_lowering, ptr2memref
- TOPS EDSL: vector-add, softmax, matrix-multiplication, dialect IR

3.2 TLE Unit Tests (Enflame, 4 new files, 7 files updated)

- test_tle_raw_tops, test_extract/insert_tile_static_index
- DeepSeek V32 sparse-MLA tutorial
- Update test_tle_cumsum, test_tle_gpu_local_ptr for GCU compatibility
- Update tutorials (01-fft, 03-topk, deepseek_v32/01-topk_selector)

[ENFLAME] support TLE features

08a2da3

baoqiliu requested review from Galaxy1458, sunnycase and zhzhcookie as code owners May 20, 2026 10:22

github-actions Bot added enflame triton_v3.6.x labels May 20, 2026

flagtree-bot and others added 13 commits May 20, 2026 10:24

Apply code-format changes

1a2b557

Apply code-format changes

bff57fa

fix code format issue

93a52d3

Apply code-format changes

8a6ecb1

fix pre commit issue

726e6aa

remove ws pass

f458c43

Apply code-format changes

8547dbd

update test case

55083b7

add pytest

fd3d375

add clear cache for cache test

125f7ef

remove readme

7870a22

update cache test

d47da02

Apply code-format changes

cf73e54

zhzhcookie reviewed May 21, 2026

View reviewed changes

Comment thread python/tutorials/tle/01-fft.py Outdated

baoqiliu and others added 10 commits May 21, 2026 21:20

remove import torch_gcu

636e5d2

Apply code-format changes

615d2ff

add test cases in yml

f794fc3

Apply code-format changes

23e9949

fix import issue

cef2553

update driver.py

b640726

Apply code-format changes

c9a7e6a

update local ptr case

7cd310a

update test case

22ae2bb

Apply code-format changes

387c1f3

baoqiliu and others added 4 commits May 22, 2026 18:31

fix setup

de366a7

Apply code-format changes

4ba89c6

Update setup.py

ff59f86

[CodeFormat] Fix code format

88a1166

zhzhcookie changed the title ~~[ENFLAME] support TLE features~~ [TLE] [ENFLAME] support TLE features May 22, 2026

zhzhcookie merged commit 5742ad4 into triton_v3.6.x May 22, 2026
10 checks passed

zhzhcookie deleted the enflame/update_0520 branch May 22, 2026 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TLE] [ENFLAME] support TLE features#602

[TLE] [ENFLAME] support TLE features#602
zhzhcookie merged 28 commits into
triton_v3.6.xfrom
enflame/update_0520

baoqiliu commented May 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

baoqiliu commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TLE-Lite现状

TLE-Struct (GPU) 现状

1. TLE Framework & GCU Dialect EDSL

1.1 TLE-Raw TOPS Runtime (new module: tle/raw/tops/) and Pass Support

1.2 TLE-Struct/Lite Core Pass Support

2. GCU Backend

2.1 Conversion Passes (Optimized)

2.2 Python Bindings (new)

3. Tests

3.1 TLE-Raw Tests (Enflame, 13 new files)

3.2 TLE Unit Tests (Enflame, 4 new files, 7 files updated)

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

baoqiliu commented May 20, 2026 •

edited

Loading