Blocked Matrix Multiplication

Having read this article, you will be able to implement blocked matrix multiplication and also to understand the number of memory accesses encountered in blocked matrix multiplication. In my next article, I will explain how these ideas can be extended to SIMD (Single Instruction Multiple Data) vector instructions. Why is Matrix Multiplication Important? Matrix multiplication … Continue reading Blocked Matrix Multiplication

How to Write a Makefile with Ease

[wpdiscuz-feedback id="jd04i5a51j" question="New Feature: Inline Commenting! Please leave your thoughts." opened="0"][/wpdiscuz-feedback] Makefiles provide a way to organize build steps involved in C / C++ project compilation. This article explains how you can set up your own makefile for your C / C++ project. Why Use a Makefile? Usual compilation with g++ will involve a command … Continue reading How to Write a Makefile with Ease

Compilation and Linking Cuda with C

cuda-gdb

Managing complexity and modularity becomes important as your project scope increases. Therefore, separate compilation and linking Cuda with C is a must have. Learn how you could compile your Cuda code separately and link with your C object code. Example Files As an example, we will look at a stencil computation (nearest neighbor computation). Let's … Continue reading Compilation and Linking Cuda with C

Parallel Merge Sort with Pthreads

tree-branch-merge-sort

Most of the implementations in the web for parallel merge sort do not consider how elements are divided between threads, if the total number of elements is not perfectly divisible by the number of threads. Also, the final merge (having joined all threads) should happen in a recursive manner. But first, let's go through a … Continue reading Parallel Merge Sort with Pthreads