Bootstrapping a compiler involves producing a self-compiling compiler, that is, we write a compiler using language A that produces code for target machine B and can still compile itself.
Table of contents.
As we saw in previous articles, we built a compiler for the Kaleidoscope programming language and supported many programming language functionalities, JIT compilation, and much more. We did all this using a high-level programming language - C++.
Another choice we have is to use a programming language that is already available on the target machine where the compiler will run.
Another common situation is whereby we have a new processor for which no compilers exist yet.
Despite this fact, we want a compiler that not only targets this specific processor but also runs on it. We want to write a compiler for language A that targets language B and is written in language B.
An obvious approach is to write a compiler in language B, if B is machine code, it is a difficult task to write any non-trivial compiler, instead, it is conventional to use a process referred to as bootstrapping.
The idea is as follows, we write a compiler in language A and let it compile itself. The result of this is a compiler from A to B written in B.
The above seems like the chicken and egg paradox, however, we will discuss how the paradox is resolved.
We use a notation designed by * H. Bratman*, it is referred to as Bratman diagrams.
We have the following notation which represents a compiler written in the C programming language, compiling a language A and targeting language B;
For use to be able to use this compiler, it must stand on a solid foundation, that is, something that can execute programs written in the C programming language.
It can be a machine that can execute C code as machine code or a C interpreter that runs on another machine or interpreter.
Several interpreters can be placed on top of each other but at the bottom of it all, we need a real machine.
An interpreter that is written in a language D and is interpreting a language C is represented using the following diagram;
A machine that directly executes a language D is represented by the following diagram
Above, the pointy bottom indicates that a machine does not need to stand on anything, it is itself a foundation upon which other things stand on.
To represent an unspecified program written in a language D, we use the following diagram;
The figures can be combined to represent program executions, e.g, running a program on machine D results in the following;
Also, not that languages must match, the program should be written in a language that is understandable by the machine.
We insert an interpreter as follows;
Also, note that languages must match in this case also, the interpreter can only interpret programs that are written in a language the interpreter can interpret.
We can run a compiler and use the following to compile a program;
The input is shown to the left and the output is to the right.
Languages match at every connection and the source and target program are not standing on anything as they are not executed in the above diagram.
We insert and interpreter as follows;
In bootstrapping we aim to write a compiler using language A that produces code for target machine B while still being able to compile itself.