Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
Code optimization transforms an LLVM IR to code that consumes fewer resources such as memory. In this article, we will learn how to apply different compiler optimizations to the LLVM IR produced.
Table of contents.
- Introduction.
- Common-subexpression elimination.
- Constant folding.
- LLVM Optimization Passes
- Summary.
- References.
Prerequisites.
Introduction.
In the prerequisite article, we saw how to generate LLVM IR(Intermediate representation) during the code generation phase in compiler design. This is also the middle phase of compilation where the code is now passed to the compiler back-end where it can be transformed into machine-dependent target code.
At this point, we should have a REPL whereby we type in code in Kaleidoscope programming language which is translated to LLVM IR(Intermediate Representation).
For example;
If we type in 4 + 4, The LLVM IR looks as follows:
Read top-level expression:define double @0() {
entry:
ret double 8.000000e+00
}
The parser wraps the programming top-level inside a function, we create and named a block 'entry'.
Finally, LLVM adds the two constants to get 8 which is also a constant.
In this article, we will implement various optimizations to produce compact LLVM IR. Optimizations can be either local or global, the former refers to changes made to code within a single basic block, and the latter involves optimizing the entire function body/method/procedure.
Common-subexpression elimination.
After we translated the source code to intermediate code, there are several occurrences of similar calculations. For example, we have the following assignment ar[i] = := ar[i] + 1 which is translated into the following intermediate code:
t1 := 4*i
t2 := a+t1
t3 := M[t2]
t4 := t3+1
t5 := 4*i
t6 := a+t5
M[t6] := t4
Above, we perform the multiplication by 4 and addition twice (t1, t5) and (t2, t6). Our goal in common sub-expression elimination is to remove such redundancies.
Constant-folding on the other hand evaluates constant expressions at compile time and replaces them with their values. For example, we have the following constant expression; 2 * 3.14 that evaluates to 6.28.
Constant folding.
We will use the IRBuilder which offers us common optimizations such as constant-folding when we compile the IR. It also has no syntactic overhead and dramatically reduces the amount of LLVM IR generated.
The following is constant folding in action for the Kaleidoscope programming language;
The following;
def test(x) 1+2+x;
The constants 1 and 2 are combined to produce 3 as shown in the following LLVM IR;
Read function definition:
define double @test(double %x) {
entry:
%addtmp = fadd double 3.000000e+00, %x
ret double %addtmp
}
In LLVM, all calls to build the LLVM IR goes through the LLVM IR builder. The builder checks if there's an opportunity for constant folding, if so constant folding is performed and returns the constant instead of creating an instruction.
The LLVM IR Builder is limited in that, it does its analysis in line with the code as it is being built. For example, for the following statement;
def test(x) (1+2+x)*(x+(1+2));, we have the following LLVM IR;
ready> Read function definition:
define double @test(double %x) {
entry:
%addtmp = fadd double 3.000000e+00, %x
%addtmp1 = fadd double %x, 3.000000e+00
%multmp = fmul double %addtmp, %addtmp1
ret double %multmp
}
Above, the LHS(left-hand side) and RHS(right-hand side) have the same value. We expect it to generate “tmp = x+3; result = tmptmp;”* instead of computing "x + 3" twice.
However, local analysis cannot detect and correct this since it requires two transformations, reassociation of expressions, and common subexpression elimination to remove redundancies. Fortunately LLVM provides optimizations in form of passes which we discuss in the next section.
LLVM Optimization Passes.
LLVM provides optimization passes that assist in a lot but have different trade-offs. LLVM allows the compiler implementor to make decisions involving the optimization to use, the order, and the situation.
For example, LLVM supports whole module passes and per-function passes, the former looks across a whole body of code while the latter only looks at a single function at a time.
We will implement per-function optimizations as a user types in a function. For this, we need to set up FunctionPassManager which will hold and organize the LLVM optimizations we want to run.
Below, we create a function to create and initialize the module and pass manager;
void InitializeModuleAndPassManager(void)
{
TheModule = make_unique<Module>("JIT AND OPTIMIZATION", *TheContext); // create new module
TheFPM = make_unique<legacy::FunctionPassManager>(TheModule.get()); // attach a pass manager
TheFPM->add(createInstructionCombiningPass()); // peephole and bit-twiddling optimizations
TheFPM->add(createReassociatePass()); // reassociation expressions
TheFPM->add(createGVNPass()); // common subexpression elimination
TheFPM->add(createCFGSimplificationPass()); // removing unreachable blocks -> simple control flow graph
TheFPM->doInitialization();
}
Above we initialize a global module TheModuke and FPM, the function pass manager. Once the pass manager is set, we add LLVM passes.
We add four optimizations, they are very useful standard cleanup optimizations.
We run the pass manager after our newly created function has been constructed but before it is returned to the client.
We add the following line in the code generation function;
TheFPM->run(*TheFunction); // optimize
Let's test this;
With the implemented optimizations, we expect the following expression;
def test(x) (1+2+x)*(x+(1+2)); to produce the following optimized LLVM IR;
ready> Read function definition:define double @test(double %x) {
entry:
%addtmp = fadd double %x, 3.000000e+00
%multmp = fmul double %addtmp, %addtmp
ret double %multmp
}
Now we only have one add compared to the previous example;
Summary.
JIT(Just In Time) compilation offers the best compilation or interpretation. It does this by allowing a program to be stored in a portable format that is executable and compiling it shortly before it is executed on the target machine.
We have learned how to add an optimizer that supports the Kaleidoscope programming language.