ptx-kernels

This is my understanding of PTX and how to write, compile and execute kernels to compare performance.

Setup

Running on Intel 12th Gen i7 CPU and RTX 3050 mobile GPU (3.5GB HBM).
Trying to compute matmul. C = A*B. Where the size of A, B, C is 8192x8192.

We compare the accuracy using maximum absolute error of computation and baseline numpy result. It should be within 1e-3.

Requirements.

GPU that supports ptx version 8. Have a python environment with following installed.

pycuda
numpy

To run the benchmark, run the following command:

python3 matmul.py

TODO: implement command line options to run different kernels.

Performance comparision.

Kernel Names	Time	Speedup
Naive Kernel	15.64s	1.00
Mem coalescing	8.62s	1.81
sh_mem blocking	2.44s	6.40
sh_mem 1d tile blking	1.24s	12.62

References

This repo skeleton code was inspired from here

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
__pycache__		__pycache__
kernels		kernels
LICENSE		LICENSE
README.md		README.md
context.py		context.py
matmul.py		matmul.py
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ptx-kernels

Setup

Requirements.

Performance comparision.

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ptx-kernels

Setup

Requirements.

Performance comparision.

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages