Written by adminMay 14, 2018

Single and double precision 4×4 and 8×8 block matrix products using SSE2, AVX and AVX512 intrinsics

Dear Reader, In this post I would like to summarize the matrix multiplication algorithms I am using in my neural network library – hopefully they will come handy for some of you. The data types Every SIMD instruction works on a vector of data in parallel. This vector is a row in our matrix. __m128: […]

Written by adminMay 13, 2018

Artificial Intelligence Fight IV. – Introducing 8×8 matrix partitions and Advanced Vector Extensions (AVX) intrinsics

I woke up at midnight and thought – what if I tried to further optimise the SSE block multiplier? I started browsing the intrinsics documentation from Intel, and I quickly came to the conclusion that it would not be too hard to implement AVX intrinsics instead of SSE, which allows blocks of 8×8 single precision […]

Written by adminMay 13, 2018

Artificial Intelligence Fight III. – Introducing matrix partitioning and SSE intrinsics

Dear Reader, I have been through a very productive weekend, I have managed to implement a simple matrix partitioning algorithm, and also a matrix multiplier function that multiplies 4×4 matrices using SSE intrinsics. Partitioning is not yet as efficient as I know it could be, because it creates 4×4 blocks no matter what. Even in […]

DeepTrainer

Deep Learning algorithm R&D for Artificial Neural Networks

Month: May 2018

Single and double precision 4×4 and 8×8 block matrix products using SSE2, AVX and AVX512 intrinsics

Artificial Intelligence Fight IV. – Introducing 8×8 matrix partitions and Advanced Vector Extensions (AVX) intrinsics

Artificial Intelligence Fight III. – Introducing matrix partitioning and SSE intrinsics