Bootstraping a Compiler

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

Bootstrapping a compiler involves producing a self-compiling compiler, that is, we write a compiler using language A that produces code for target machine B and can still compile itself.

Table of contents.

Introduction.
Notation.
Summary.
References.

Introduction.

As we saw in previous articles, we built a compiler for the Kaleidoscope programming language and supported many programming language functionalities, JIT compilation, and much more. We did all this using a high-level programming language - C++.
Another choice we have is to use a programming language that is already available on the target machine where the compiler will run.

Another common situation is whereby we have a new processor for which no compilers exist yet.
Despite this fact, we want a compiler that not only targets this specific processor but also runs on it. We want to write a compiler for language A that targets language B and is written in language B.

An obvious approach is to write a compiler in language B, if B is machine code, it is a difficult task to write any non-trivial compiler, instead, it is conventional to use a process referred to as bootstrapping.
The idea is as follows, we write a compiler in language A and let it compile itself. The result of this is a compiler from A to B written in B.

The above seems like the chicken and egg paradox, however, we will discuss how the paradox is resolved.

Notation.

We use a notation designed by * H. Bratman*, it is referred to as Bratman diagrams.
We have the following notation which represents a compiler written in the C programming language, compiling a language A and targeting language B;

bc15

For use to be able to use this compiler, it must stand on a solid foundation, that is, something that can execute programs written in the C programming language.
It can be a machine that can execute C code as machine code or a C interpreter that runs on another machine or interpreter.

Several interpreters can be placed on top of each other but at the bottom of it all, we need a real machine.

An interpreter that is written in a language D and is interpreting a language C is represented using the following diagram;

bc16

A machine that directly executes a language D is represented by the following diagram

bc17

Above, the pointy bottom indicates that a machine does not need to stand on anything, it is itself a foundation upon which other things stand on.

To represent an unspecified program written in a language D, we use the following diagram;

bc1

The figures can be combined to represent program executions, e.g, running a program on machine D results in the following;

bc2

Also, not that languages must match, the program should be written in a language that is understandable by the machine.

We insert an interpreter as follows;

bc3

Also, note that languages must match in this case also, the interpreter can only interpret programs that are written in a language the interpreter can interpret.

We can run a compiler and use the following to compile a program;
bc4

The input is shown to the left and the output is to the right.
Languages match at every connection and the source and target program are not standing on anything as they are not executed in the above diagram.

We insert and interpreter as follows;
bc5

Summary.

In bootstrapping we aim to write a compiler using language A that produces code for target machine B while still being able to compile itself.

References.

Compiling compilers

Bootstraping a Compiler

Compiler Design

Table of contents.

Introduction.

Notation.

Summary.

References.

Add debugging support in Programming Language

Compiling LLVM IR to Object Code