In this blog post, I will provide a detailed explanation about writing a C++ program with an external Assembly function written for the X86_64 architecture. There are three types of Assembly syntax in use today: Intel syntax, AT&T syntax and GAS syntax (General Assembly). Although Intel syntax is much easily readable and is widely used in Intel’s programming manuals, I will use AT&T syntax because it’s the default in GNU GCC compiler.
If instead you would like to follow the same Assembly program with Intel syntax by adding the following line.
".intel_syntax noprefix"
asm(".intel_syntax noprefix");
1. Define the Main and Assembly Function
Let’s write a function to accept a 64 bit integer and increment it by 1. We will see which registers are used when passing integers and see how we obtain the return value.
#include <iostream>
#include <stdint.h>
extern "C" uint64_t inc_func(uint64_t x);
int main(int argc, char** argv) {
for (uint64_t i = 0; i < 10; ++i) {
std::cout << "iteration: " << i << " inc value: "
<< inc_func(i) << std::endl;
}
return 0;
}
Notice that we are using unsigned integers 64 bit specified by uint64_t. If you’re wondering, the t at the end denotes that it’s used as a type. Our function is defined in an Assembly file. We let the compiler know this fact by using the extern “C” keyword. The regular function definition follows the keyword.
We’ll go through the method line by line. The first step would be to notify the function definitions that can be found within the assembly file. This is a separate new file with a .S
extension. We use the global keyword to let the compiler know that the function definition indeed resides within the file.
.global inc_func
.text
2. Define the Function Prelog & Epilog
Now we can define the method itself. Notice that the function begins with a function prelog. X86_64 stores function variables in a stack data structure for efficient access. Therefore, we need a stack pointer to figure out which frame belongs to the current function. The stack pointer is by default kept in the register rsp (rsp is used in 64 bit mode and esp is used in 32 bit mode) and points to the top of the stack. Since the register is 64 bits long, if you use esp, the upper half of the register is zeroed out.
To make it easier to program (specially when using private variables), we use another register called rbp to store the rsp address value. A function refers to variables stored in the stack in reference to rbp. For example -0x8(%rbp)
will give you access to the first private variable. Here the values are 8 byte aligned (because of 64 bits). Thus we use -0x8
.
inc_func:
push %rbp # function prelog
movq %rsp, %rbp #
...........................
pop %rbp # function epilog
ret # return
We are pushing rbp to the stack because we need to replace rbp with our current stack pointer. But before we replace that we have to ensure that when the function returns, the caller function’s original rbp remains intact. We call this a callee-saved register and the callee function pushes rbp to the stack. Then moves the current stack pointer (rsp) to be the base (rbp). We refer to this as a function prelog.
Notice that when the function returns, the register value is returned by popping rbp out of the stack. We refer to this as the function epilog.
3. Define the Assembly Function Body
X86_64 has a convention where integer parameters to a function are passed in the following register order. If more than 6 parameters are passed, the rest of the values are stored on stack.
rdi, rsi, rdx, rcx, r8, r9
floating point parameters are passed in a different order. If your CPU has support for AVX, AVX2 or AVX512, the values are passed in SIMD registers (XMM0 – XMM7). If more than 8 parameters are passed, the rest of the values are stored on stack.
xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7
By this convention, we see that our value must be stored in rdi. For unsigned integers, we will have to use ADCX from the ISA. Since we are using a quadword, we will use adcq. In Intel’s world, a word is 16 bits long. Thus a double word is 32 bits long and a quadword is 64 bits long. By default we use 32 bit (4 byte) integers (without any qualifiers). But in this case, we are explicitly using 64 bit (8 byte) integers.
inc_func:
push %rbp # function prelog
movq %rsp, %rbp #
adcq $0x1, %rdi # add 1 to uint64_t
............................................
pop %rbp # function epilog
ret # return
4. Define Return
By default, the function stores return values in rax register (in 64 bit mode and eax in 32 bit mode). Therefore, we have to move the result from rdi to rax. Notice that always the destination is the rightmost register.
In contrast, Intel syntax considers the destination to be the leftmost register. It’s easier to remember that AT&T uses the rightmost register as the destination and use the exact opposite for Intel’s syntax. In fact, as a rule of thumb, I’ve seen that in many cases, the register / immediate order in AT&T is the exact opposite order of Intel. So, if you have three operands, reversing the order will let you adjust to the syntax quickly.
inc_func:
push %rbp # function prelog
movq %rsp, %rbp #
adcq $0x1, %rdi # add 1 to uint64_t
movq %rdi, %rax # move result to rax
pop %rbp # function epilog
ret # return
5. Compile Your Program
We will use a three step process to compile. First, we will create the object files main.o and func.o and finally create our binary by linking with the libraries. Let’s create a Makefile with the rules to make our life easy. please look at this article, if you need to learn more about Makefiles.
CC=g++
FLAGS=-std=c++11
TARGET=main
all: $(TARGET)
func.o: func.S
$(CC) -o $@ -c $^
main.o: main.cc
$(CC) -o $@ -c $^
main: func.o main.o
$(CC) -o $@ $^
.PHONY:
clean
clean:
rm -rf *.o $(TARGET)
That’s it. You’ve successfully written your first Assembly function. The full example is available at https://github.com/malithj/blog-examples/tree/master/asm-intro.
Don’t forget to leave a comment or subscribe if this was a life saver.