Compilation process in C (from code.c to a.out)


Reading time: 20 minutes | Coding time: 10 minutes

Compilation is the process of converting a code in a programming language (C in our case) to machine code. In this process, the code goes through different steps depending upon the compiler and language. We will explore the compilation steps in C.

The different steps involved in compiling a C code are:

  • Preprocessor (code.c -> code.i)
  • Assembly code (code.i -> code.s)
  • Object code (code.s -> code.o)
  • Linker (code.o -> a.out)

Following image captures the above idea:

compilation_c

In summary, the commands are:

gcc -E code.c # generates preprocessed code
gcc -S code.c # generates assembly code
gcc -c code.c # generates object code
gcc code.c    # generates linked object code

We will go through all the four steps in detail through an example C code to clarify the ideas involved.

In our example code, we will add a custom header file and use include guards ifndef so that we can understand the process completely. Consider this code:

code.c:

#include <stdio.h>
#include "opengenus.h"
int main() 
{
    // this is our comment
	#ifndef opengenus
	printf("opengenus 1\n");
	#else
	printf("opengenus 2\n");
	#endif
	printf("this is a C code\n");
	return 0;
}

opengenus.c:

#define opengenus 1

If we simply compile and execute the code, we will get the expected output. Compile and execute using:

gcc code.c
./a.out

Output:

opengenus 2
this is a C code

We will compile the code step by step.

Preprocessor

In this preprocessor step, we do the following:

  • remove all comments
  • replace the include header files statements with the actual content of the header file
  • keep only the code defined as per the include guards (ifndef in our case)

The command to see the preprocessed code is:

gcc -E code.c

The output is huge due to the contents of the stdio.h file. Following is a small section of the output for us to analyze:

# 1 "code.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4

// much for code in between

extern int ftrylockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__)) ;

extern void funlockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__));
# 943 "/usr/include/stdio.h" 3 4

# 2 "code.c" 2
# 1 "opengenus.h" 1
# 3 "code.c" 2
int main()
{

 printf("opengenus 2\n");

 printf("this is a C code\n");
 return 0;
}

As you will see the resulted code that is the main function can been reduced to only the statements that are executed and the code has been removed as well.

Assembly

The compiler will convert the preprocessed code code.i into assembly code code.s. The assembly code is the code written in assembly language. The command to generate the assembly code is:

gcc -S code.c

The contents of the assembly code will be as follows:

	.file	"code.c"
	.section	.rodata
.LC0:
	.string	"opengenus 2"
.LC1:
	.string	"this is a C code"
	.text
	.globl	main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movl	$.LC0, %edi
	call	puts
	movl	$.LC1, %edi
	call	puts
	movl	$0, %eax
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE0:
	.size	main, .-main
	.ident	"GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-36)"
	.section	.note.GNU-stack,"",@progbits

This is an interesting and insightful piece of code which you should go through so that you will get the idea of how our C code and header files have been converted to such a short and optimized code.

Object code

The assembler takes our assembly code code.s and converts it in object code code.o. The command to generate object code is:

gcc -c code.c

A section of the contents of the code.o file is:

7f45 4c46 0201 0100 0000 0000 0000 0000
0100 3e00 0100 0000 0000 0000 0000 0000
0000 0000 0000 0000 e802 0000 0000 0000
0000 0000 4000 0000 0000 4000 0d00 0c00
5548 89e5 bf00 0000 00e8 0000 0000 bf00
0000 00e8 0000 0000 b800 0000 005d c36f
7065 6e67 656e 7573 2032 0074 6869 7320
6973 2061 2043 2063 6f64 6500 0047 4343
3a20 2847 4e55 2920 342e 382e 3520 3230
3135 3036 3233 2028 5265 6420 4861 7420
342e 382e 352d 3336 2900 0000 0000 0000
1400 0000 0000 0000 017a 5200 0178 1001
1b0c 0708 9001 0000 1c00 0000 1c00 0000
0000 0000 1f00 0000 0041 0e10 8602 430d
065a 0c07 0800 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000

Linking

In this step, our object code code.o is linked with the other code we are using. In this case, we are using printf function so our object code will be linked with the object code for printf or rather the library code containing printf and an executable will be created.

To create the final executable, use the command:

gcc code.c

The contents of the executable a.out when opened in a text editor is:

7f45 4c46 0201 0100 0000 0000 0000 0000
0200 3e00 0100 0000 3004 4000 0000 0000
4000 0000 0000 0000 6019 0000 0000 0000
0000 0000 4000 3800 0900 4000 1f00 1e00
0600 0000 0500 0000 4000 0000 0000 0000
4000 4000 0000 0000 4000 4000 0000 0000
f801 0000 0000 0000 f801 0000 0000 0000
0800 0000 0000 0000 0300 0000 0400 0000
3802 0000 0000 0000 3802 4000 0000 0000
3802 4000 0000 0000 1c00 0000 0000 0000
1c00 0000 0000 0000 0100 0000 0000 0000
0100 0000 0500 0000 0000 0000 0000 0000
0000 4000 0000 0000 0000 4000 0000 0000
1c07 0000 0000 0000 1c07 0000 0000 0000
0000 2000 0000 0000 0100 0000 0600 0000
100e 0000 0000 0000 100e 6000 0000 0000
100e 6000 0000 0000 1c02 0000 0000 0000
...

You will notice that the size of a.out executable is much larger than our object code code.o but the initial code is same. This is because the initial code is our actual code and the remaining code in a.out is the object code for the header file stdio.h for printf function.

In summary, the step by step command of compilation in C are:

gcc -E code.c # generates preprocessed code
gcc -S code.c # generates assembly code
gcc -c code.c # generates object code
gcc code.c    # generates linked object code

This should have clarified the different steps involved in compiling a C code.