Runtime Code Generation for Convolutions


Summary Our work MARLIN (Matrix Multiplication through Reduced Load Instructions), is now available on github. MARLIN is a runtime code generation library for convolution kernels. The paper won the first place in graduate student category in CGO 2021 (International Symposium on Code Generation and Optimization).

Assembly Function using AT&T Syntax

In this blog post, I will provide a detailed explanation about writing a C++ program with an external Assembly function written for the X86_64 architecture. There are three types of Assembly syntax in use today: Intel syntax, AT&T syntax and GAS syntax (General Assembly). Although Intel syntax is much easily readable and is widely used

Blocked Matrix Multiplication

Having read this article, you will be able to implement blocked matrix multiplication and also to understand the number of memory accesses encountered in blocked matrix multiplication. In my next article, I will explain how these ideas can be extended to SIMD (Single Instruction Multiple Data) vector instructions. Why is Matrix Multiplication Important? Matrix multiplication

Dinero Trace Generator using Javascript


It's quite surprising that around the time when this blog post was written, I couldn't find a Dinero Trace Generator online. Dinero is a cache simulator software that had been developed around 20 years ago. To generate traces, you just have to do simple bit manipulations. Ironically, today, our modern browsers have the power to