Skip to content

AOCL-BLAS 5.3

Latest

Choose a tag to compare

@rohrayan rohrayan released this 20 May 05:15

Performance improvements in S/D/ZGEMM on Zen3/4/5
SGEMM Optimizations for tiny matrices
New Thread Control APIs with global and thread-local variants
Support for OpenMP 2.5 and earlier versions
Optional support for reproducibility using compiler options
Updates to aocl-gemm add-on module
Column Major support for BF16 and FP32
FP32 RD kernels for AVX512 and AVX2 ISA
GEMV kernel for m=1 case using AVX2 and AVX512 YMM registers