Learning notes for deep learning framework internals.
This repository collects resources and notes about PyTorch, OneFlow, TorchScript, distributed training, autograd, memory management, operator development, and framework-level performance optimization.
- PyTorch internals: autograd, CUDA extension, data loading, memory management, AMP, TorchScript, Dynamo, AOTAutograd, and performance tuning.
- OneFlow internals: execution model, operators, distributed tensors, runtime, VM, and CUDA kernels.
- ML systems engineering: framework architecture, operator implementation, and training/runtime optimization.
- CUDA and GPU optimization: https://github.com/BBuf/how-to-optim-algorithm-in-cuda
- Deep learning compiler notes: https://github.com/BBuf/tvm_mlir_learn
Legacy learning archive. The repository remains public for reference, with English public-facing documentation going forward.