Assembly language: X86 Architecture

Internship at OpenGenus

Get this book -> Problems on Array: For Interviews and Competitive Programming

In this article we explore the assembly language for the X86 CISC computer architecture.

Table of contents.

  1. Introduction to assembly language.
  2. The X86 assembly language.
  3. Summary.
  4. References.

Introduction to assembly language.

Assembly languages are processor specific and are fundamental to compiler design.
In this article we shall use the gcc compiler and assembler for our examples.

Hello World

#include<stdio.h>

int main(int argsc, char *argv[]){
    printf("hello %s\n", "world);
    return 0;
}

Compilation

gcc -S test.c -o test.s

#view the compiled assembly code
cat test.s

Output

        .file   "test.c"
        .text
        .section        .rodata
.LC0:
        .string "world"
.LC1:
        .string "hello %s\n"
        .text
        .globl  main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        subq    $16, %rsp
        movl    %edi, -4(%rbp)
        movq    %rsi, -16(%rbp)
        leaq    .LC0(%rip), %rsi
        leaq    .LC1(%rip), %rdi
        movl    $0, %eax
        call    printf@PLT
        movl    $0, %eax
        leave
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE0:
        .size   main, .-main
        .ident  "GCC: (Debian 10.2.1-6) 10.2.1 20210110"
        .section        .note.GNU-stack,"",@progbits

The output of your compiler may be different.

Assembly code elements.

Regardless of the CPU architecture, assembly code will have the following elements;

  1. Directives.
    They begin with a . (dot) and used t indicate structural information that is useful for the assembler, linker or debugger.
    .data indicates the start of the data segment.
    .text indicates the start of the program segment.
    .string indicates a constant within the data section.
    .globl main indicates that the label main is a global symbol that can be accessed by other code modules.

  2. Labels
    These end with a colon and by their position are used to indicate the relationship between names and locations.
    An example;
    The label .LCO: indicates that the following string should be called .LCO.
    The label main: indicates that the instruction PUSHQ %rbp is the first instruction of the main function.
    Labels beginning with a . dot are temporary local labels generate by the compiler and as such don't become part of the machine code but are present in the resulting object code for the purposes of linking and in the executable file for the purpose of debugging.

  3. Instructions
    Are symbols like PUSHQ %rbp and are indented for visual distinction from directives and labels.
    Note that instructions in GNU assembly are not case sensitive but are uppercased for consistency.
    We can take the assembly code test.s and compile it to a runnable program.

Compiling assembly to an executable

gcc test.s -o test
#run executable
./test

Output*

hello world

Compiling to object code.

gcc test.s -c -o test.o

we use the nm utility to display symbol(names) present in the code.

nm test.o

Output

                 U _GLOBAL_OFFSET_TABLE_
0000000000000000 T main
                 U printf

The above information from object code is available to the linker.
main is present in the text(T) section of the object at location.
printf is undefined(U) since is will be obtained from the standard library.
.LCO might appear if not declared as .global.

The X86 assembly language.

Registers and data types

There exist 16 almost general purpose 64-bit integer registers.
These are, %rax, %rbx, %rcx, %rdx, %rsi, %rdi, %rbp, %rsp, %r8, %r9, %r10, %r11, %r12, %r13, %r14, %r15

These registers are almost general purpose in that, earlier versions of the processors had the intention of each register being used for a specific purpose therefore not all instructions could be applied to every register.

AT&T syntax and Intel Syntax

GNU tools use traditional AT&T syntax which is used across many processors on unix-like operating systems.
Intel syntax is used on DOS and windows systems.

An example of AT&T syntax

MOVQ %RSP, %RBP

Here MOVQ is the instruction and RSP and RBP are registers where source is given first and destination second.

An example of Intel Syntax

MOVQ RBP RSP

In this case MOVQ is the instruction, the source comes second and destination comes first, also not that the % sign is not used.

Names of the first eight registers indicate the purpose of each register, e.g the %rax register is the accumulator.
As design developed new instructions and addressing modes were added to make the registers general purpose however some instructions, specifically those related to string processing require the use of %rsi and %rdi.
Also two registers are reserved for the use of stack and base pointers namely %rsp and %rbp respectively.

As architecture expanded registers got some internal structure;
An example of %rax
The lowest 8 bits of the *%rax *register are the 8-bit register %al and the next 8 bits are know as %ah.
The low 16 bits are know as %ax and the low 32 bits known as %eax.
The whole 64 bits are therefore known as %rax.

assembly1-1

Structure of %r8 - %r15

assembly2

Addressing Modes.

MOV instruction moves data between registers and to and from memory in a variety of modes.
A single letter suffix is used to determine the size of data being moved.

Suffix Name Size
B BYTE 1 byte (8 bits)
W WORD 2 bytes (16 bits)
L LONG 4 bytes (32 bits)
Q QUADWORD 8 bytes (64 bits)

Therefore, MOVB moves a byte, MOVW a word, MOVL a long and MOVQ a quad word.
Size of locations moving to and from must match the suffix.

Addressing modes for the MOV arguments.

A global value This is referred to by an unadorned name e.g x or printf which is translated by the assembler to an absolute address or an address computation.
An immediate value, is a constant value indicated by a $ sign e.g $89 and that has a limited range depending on the used instruction.
A register value, it the name of a register e.g %rbx.
An indirect value, is a value by the address contained in a register e.g %rsp which refers to the value pointed to by %rsp.
A base-relative, is obtained by adding a constant to the register name. e.g -16(%rcx) this refers to the value at the memory location 16 bytes below the address pointer at by %rcx.
Useful for manipulating stacks, local values and function parameters where the start of an object is given by a register.
A complex address, is an address in the form of D(RA, RB, C) which refers to the value at address RA + RB * C + D. where;
RA and RB are general purpose registers.
C can have the value 1, 2, 4 or 8.
D can be any integer displacement.
Selecting items within an array using this mode will lead to RA being the base of the array,
RB, the index into the array,
C size of items in the array and
D an offset relative to the item.

An example of loading a 64-bit value into % rax

Mode Example
Global Symbol MOVQ x, %rax
Immediate MOVQ $56, %rax
Register MOVQ %rbx, %rax
Indirect MOVQ (%rsp), %rax
Base-Relative MOVQ -8(%rbp), %rax
Complex MOVQ -16(%rbx,%rcx,8), %rax

Same addressing modes may be used to store data in registers with an exception e.g, it is not possible to use base-relative for both MOV argments, i.e
MOVQ -8(%rbx), -8(%rbx).

For loading an address of a variable, LEA is used since it can perform same address computations.

Mode Example
Global Symbol LEAQ x, %rax
Base-Relative LEAQ -8(%rbp), %rax
Complex LEAQ -16(%rbx,%rcx,8), %rax

It is useful when working with strings and arrays.

Basic Arithmetic.

Arithmetic in compilers involve integer addition, subtraction, multiplication and division.

ADD and SUB will have two operands. The source operand and destructive target.

An example

ADDQ %rbx, %rax.

The above instruction will add %rbx to %rax and stores result in %rax by overwriting what might have been stored there. This should be done carefully as one could loose a value that might be of use later.

Another Example
c = a + b + b translates to

MOVQ a, %rax
MOVQ b, %rbx
ADDQ %rbx, %rax
ADDQ %rbx, %rax
MOVQ %rax, c

Multiplication
We use the IMUL instruction.
Multiplying two 64-bit integers results in a 128-bit integer therefore, IMUl will take the two arguments and multiply them then place the low 64-bits of the result in %rax and the high bits in %rdx.

An example
Given c = b * (b + a) where a, b and c are global integers.

The translation would be

MOVQ  a, %rax
MOVQ  b, %rbx
ADDQ  %rbx, %rax
IMULQ %rbx
MOVQ  %rax, c

Division
We use the IDIV instruction.
Its does the same thing as the IMUL instruction but backwards, in that, it starts with a 128 bits integer value whose low 64 bits are in %rax and high bits are in %rdx and divides it by the given value in the instruction.
The quotient is placed in %rax and remainder in %rdx.
For the modulus instruction, we use the value of %rdx.

For division to happen, the registers must have the necessary sign-extended value, in this case we use the CQO instruction whose function is to sign-extend %rax into %rdx for division.
Additionally, if the dividend fits in the lower 64 bits but it is negative, the upper 64 bit must all be ones to complete the twos-complement representation.

An example(division by 5)

MOVQ a,  %rax    # set the low 64 bits of the dividend
CQO              # sign-extend %rax into %rdx
IDIVQ $5         # divide %rdx:%rax by 5,
                 #          leaving result in %rax

Increment and Decrement.
INC and DEC instruction ares used to increment and decrement a register destructively.

An example
a = ++b is translated to

MOVQ b, %rax
INCQ %rax
MOVQ %rax,b
MOVQ %rax, a

AND, OR, XOR.
These perform destructive bitwise(bit by bit operation) boolean operations on two values.

An example

AND $0101B $0110B

will yield
$0100B

The NOT instruction would invert all bits in the operand.

An example
c = (a & ~b) is translated to;

MOVQ a, %rax
MOVQ b, %rbx
NOTQ %rbx
ANDQ %rax, %rbx
MOVQ %rbx, c

Note that it is more convenient to use MOV for loading values in and out of registers then perform arithmetic using only registers.

Comparison and Jumps.

The JMP instruction is used here to create an infinite loop that counts from 0 using the %rax register.

An example

      MOVQ $0, %rax
loop: INCQ %rax
      JMP loop

Compares and jumps are used to provide more useful structures such as termination of loops and evaluation of if-else statements to control program flow.

For comparisons CMP instruction is used.
It compares two registers and sets a few bits in the internal EFLAGS register recording whether that values are equivalent, greater or lesser.
A selection of conditional jumps examine the EFLAGS registers and jump appropriately.

Instruction Meaning
JE Jump if Equal
JNE Jump if Not Equal
JL Jump if Less
JLE Jump if Less or Equal
JG Jump if Greater
JGE Jump if Greater or Equal

An example(loop to count %rax from 0 - 5)

        MOVQ $0, %rax
loop:   INCQ %rax
        CMPQ $5, %rax
        JLE  loop

An example(conditional assignment)
If global variable x is > 0, then y = 10, else y = 20

        MOVQ x, %rax
        CMPQ $0, %rax
        JLE  .L1
.L0:
        MOVQ $10, $rbx
        JMP  .L2
.L1:
        MOVQ $20, $rbx
.L2:
        MOVQ %rbx, y

Note that jump requires the compiler to define unique target labels which are private within one assembly file and thus cannot be seen outside the file unless a .global directive is used.

The Stack.

This is an auxilliary data structure used to record function call history of a program along with local variable that don't fit in registers.
The stack grows downwards from high values to low values.
The %rsp register(stack pointer) keeps track of the bottom-most item on the stack.

An example(pushing %rax onto the stack)
We subtract 8(size of %rax) from %rsp and write the location pointed to by $rsp.

SUBQ $8, %rsp
MOVQ %rax, (%rsp)

or simply using one instruction

PUSHQ %rax

Popping.
This is the opposite of pushing

MOVQ (%rsp), %rax
ADDQ $8, %rsp

or simply using one instruction

POPQ %rax

Discarding the most recent value from stack involves moving the stack pointer the appropriate number of bytes.

ADDQ $8, %rsp

Note that in 64-bit code, PUSH and POP are limited to working with 64-bit values therefore manual MOV and ADD are used to move smaller items to and from stack.

Calling a function.

64-bit code uses the System V ABI convention so as to exploit the large number of available registers in the X86-64 architecture.

The System V ABI

  • The first 6 integers(including pointers and types that can be stored as integers) are placed in registers %rdi, %rsi, %rdx, %rcx, %r8, and %r9 in this order.
  • The first 8 floating points arguments are placed in registers %xmm0 to %xmm7 in this order.
  • Excess arguments in these registers are pushed onto the stack.
  • If a function takes a variable number of arguments as its parameters, the %rax register is set to the number of floating point arguments.
  • The return value a function is placed in %rax.

Handling of the remaining registers is by use of caller saved where by the calling function saves values before invoking other functions and callee saved meaning that a function when called will save the values of the registers and restore then on return.

Invoking a function involves computing the arguments and placing them in the desired registers then pushing the two-called-saved registers(%r10, %r11) onto the stack so as to save their value.
We use the CALL instruction which pushes the current instruction pointer onto the stack and jumps to the code location of the function.
When the function returns, we pop the two-called saved registers off the stack and get the return value of the function in the %rax register.

An example

int x = 0;
int y = 10;

int main(){
    x = printf("value: %d\n",y);
}

The above code is translated to

.data
x:
        .quad 0
y:
        .quad 10
str:
        .string "value: %d\n"
.text
.global main
main:
        MOVQ  $str, %rdi  # first argument in %rdi: string
        MOVQ  y,    %rsi  # second argument in %rsi: y
        MOVQ  $0,   %rax  # there are zero float args
        PUSHQ %r10        # save the caller-saved regs
        PUSHQ %r11
        CALL  printf      # invoke printf
        POPQ  %r11        # restore the caller-saved regs
        POPQ  %r10
        MOVQ  %rax, x     # save the result in x
        RET               # return from main function

System V ABI variable assignments.

Register Purpose Saver
%rax result not saved
%rbx scratch callee saves
%rcx argument 4 not saved
%rdx argument 3 not saved
%rsi argument 2 not saved
%rdi argument 1 not saved
%rbp base pointer callee saves
%rsp stack pointer callee saves
%r8 argument 5 not saved
%r9 argument 6 not saved
%r10 scratch caller saves
%r11 scratch caller saves
%r12 scratch callee saves
%r13 scratch callee saves
%r14 scratch callee saves
%r15 scratch callee saves

A Leaf Function.

A leaf function is a function that computes a value without calling other functions.
They are easy to write since function arguments are passed in as registers.

An example

square: function integer ( x: integer ) ={
    return x * x;
}

Is transalated to.

.global square
square:
    MOVQ  %rdi, %rax    # copy first argument to %rax
    IMULQ %rax          # multiply it by itself
                    # result is already in %rax
    RET                 # return to caller

In the general case a more complex approach is needed because the above function will not work for a function that wants to invoke other functions since the stack is not set up properly.

A complex function.

A complex function is a function that is able to invoke other functions and compute expressions for an arbitrary complexity and return to the caller with the original state intact.

An example of a function that takes 3 arguments and uses 2 local variables.

.global func
func:
    pushq %rbp          # save the base pointer
    movq  %rsp, %rbp    # set new base pointer
    pushq %rdi          # save first argument on the stack
    pushq %rsi          # save second argument on the stack
    pushq %rdx          # save third argument on the stack
    subq  $16, %rsp     # allocate two more local variables
    pushq %rbx          # save callee-saved registers
    pushq %r12
    pushq %r13
    pushq %r14
    pushq %r15
    ### body of function goes here ###
    popq %r15            # restore callee-saved registers
    popq %r14
    popq %r13
    popq %r12
    popq %rbx
    movq   %rbp, %rsp    # reset stack pointer
    popq   %rbp          # recover previous base pointer
    ret                  # return to the caller

We keep track of the arguments passed to the function, information necessary for return and space for local computations.
We use a %rbp base register pointer which will point to the start of the values used by the function unlike a %rsp stack pointer which will point to the end of the stack where new data will be pushed.
The stack frame is the space between %rbp and %rsp.

A problem could arise whereby a function is called in the middle on another considering that each function uses a selection of registers to perform computations.

We prevent this by ensuring that each function saves and restores all registers that it uses by pushing them onto the stack at the beginning and popping then off the stack before returning.

Looking at the System V ABI variable assignments each function should aim to preserve the values %rsp, %rbp, %rbx, and %r12-%r15 when it completes

The stack layout of the function func.

Contents Address
old %rip register 8(%rbp)
old %rbp register (%rbp) (%rbp points here)
argument 0 -8(%rbp)
argument 1 -16(%rbp)
argument 2 -24(%rbp)
local variable 0 -32(%rbp)
local variable 1 -40(%rbp)
saved register %rbx -48(%rbp)
saved register %r12 -56(%rbp)
saved register %r13 -64(%rbp)
saved register %r14 -72(%rbp)
saved register %r15 -80(%rbp) (%rsp points here)

The base pointer %rbp is responsible for locating the start of the stack frame.
Within the body of the function we use base-relative addressing against base pointer to refer to arguments and local variables.

Arguments to the function follow the base pointer therefore argument 0 will be at 8(%rbp), argument 1 at -16(%rbp) and so on.

Then the local variables start at -32(%rbp) and are saved in registers at -48(%rbp).
The stack pointer will point to the last item on the stack.

An Example
We are given the B-minor function.

compute: function integer( a: integer, b: integer, c: integer ) ={
    x:integer = a + b + c;
    y:integer = x * 5;
    return y;
}

The translation is as follows.

.global compute
compute:
##################### preamble of function sets up stack
pushq %rbp          # save the base pointer
movq  %rsp, %rbp    # set new base pointer to rsp
pushq %rdi          # save first argument (a) on the stack
pushq %rsi          # save second argument (b) on the stack
pushq %rdx          # save third argument (c) on the stack
subq  $16, %rsp     # allocate two more local variables
pushq %rbx          # save callee-saved registers
pushq %r12
pushq %r13
pushq %r14
pushq %r15
######################## body of function starts here
movq  -8(%rbp),  %rbx   # load each arg into a register
movq  -16(%rbp), %rcx
movq  -24(%rbp), %rdx
addq  %rdx, %rcx        # add the args together
addq  %rcx, %rbx
movq  %rbx, -32(%rbp)   # store the result into local 0 (x)
movq  -32(%rbp), %rbx   # load local 0 (x) into a register.
movq  $5, %rcx          # load 5 into a register
movq  %rbx, %rax        # move argument in rax
imulq %rcx              # multiply them together
movq  %rax, -40(%rbp)   # store the result in local 1 (y)
movq  -40(%rbp), %rax   # move local 1 (y) into the result
#################### epilogue of function restores the stack
popq %r15          # restore callee-saved registers
popq %r14
popq %r13
popq %r12
popq %rbx
movq %rbp, %rsp    # reset stack to base pointer.
popq %rbp          # restore the old base pointer
ret                # return to caller

This function doesn't need to use registers and therefore it is not necessary to save and restore them.
Similarly we can keep arguments in registers without saving them to the stack.
Rather than saving the result to local variable, we could compute it directly into %rax.
For writing code these optimizations are easy to make, not so the case when it comes to compilers.

Summary.

Assembly language enables programmers to write human-readable code that is close to machine language and help in providing full control of what tasks the computer should perform.
It is memory efficient, fast, hardware oriented and allows execution of complex jobs in a simplified manner.
With all that it comes with some drawbacks such as the time and effort to write assembly code is a lot not considering the complexity and syntax, it also lacks portability for different computer architectures and requires more memory for longer programs.

References.

  1. RISC and CISC computer architectures
  2. X86 Aseembly Manual