Learning CUDA and contributing code every day
- Programming Massively Parallel Processors
- Cuda By Example An Introduction to General-Purpose GPU Programming
| Day | Description |
|---|---|
| Day 1 | Basic Vector Addition in CUDA |
| Day 2 | Implemented Matrix Addition in CUDA |
| Day 3 | Implemented Matrix Multiplication in CUDA |
| Day 4 | RELU implementaion in CUDA |
| Day 5 | Leaky RELU implementaion in CUDA |
| Day 6 | Softmax implementaion in CUDA |
| Day 7 | Dot Product implementation in CUDA |
| Day 8 | Reduce Sum implementation in CUDA |
| Day 9 | Layer Normalization implementation in CUDA |
| Day 10 | Matrix Transpose implementation in CUDA |
| Day 11 | 1d convolution implementation in CUDA |
| Day 12 | 2d convolution implementation in CUDA |
| Day 13 | Optimised Reduce Sum with Sequential Addressing in CUDA |
| Day 14 | Tiled matrix multiplication in CUDA |
| Day 15 | Array Reversal in CUDA |
| Day 16 | Optimised Reduce Sum implementation in CUDA |
| Day 17 | Simple Attention implementation in CUDA |
| Day 18 | Layer Norm implementation using shared memory in CUDA |
| Day 19 | Matrix Transpose implementation using shared memory in CUDA |
| Day 20 | Flash attention forward pass implementation in CUDA |
| Day 21 | Binary cross entropy loss implementation in CUDA |
| Day 22 | Binary cross entropy loss with softmax implementation in CUDA |
| Day 23 | Naive Bayes implementation in CUDA |
| Day 24 | BFS implementation in CUDA |
| Day 25 | Mini-batch Stochastic Gradient Descent implementation in CUDA |
| Day 26 | Batch Normalization implementation in CUDA |