Compiling LLVM IR to Object Code

Do not miss this exclusive book on Binary Tree Problems. Get it now for free.

We have seen how to implement the Kaleidoscope programming language, from source code to an LLVM IR to optimization to implementing a JIT compiler and implementing further extensions. In this article, we will compile the LLVM IR into object code.

Table of contents.

  1. Introduction.
  2. The target machine.
  3. Object code.
  4. Tying it together.
  5. Summary.
  6. References.

Prerequisites.

Variable Mutation in Kaleidoscope

Introduction.

In this article, we compile our code into object code. Object code is the output we get from the compiler. Object files are not yet ready to be executed, linking and loading are steps that still remain before the code is running. The linker connects everything together, this is including header files, etc while the loader loads the program into memory for execution.

The target machine.

Object code is machine-dependent meaning object code that can be executed on a particular processor architecture is not necessarily able to execute on a different platform.
First, we want to generate object code for a specific target machine. To get your current machine architecture, we execute the following command;

$ clang --version | grep Target

LLVM provides sys::getDefaultTargetTriple which returns the target triple of the current machine and therefore we don't need to hard code a target triple to the target machine;

auto TargetTriple = sys::getDefaultTargetTriple();

Also, LLVM doesn't allow us to link in all target functionality, That is, if we just want to use JIT, we don't need assembly printers, also if we are targeting a specific machine, we only link functionalities related to that task.

Below, we initialize all targets for emitting object code;

InitializeAllTargetInfos();
InitializeAllTargets();
InitializeAllTargetMCs();
InitializeAllAsmParsers();
InitializeAllAsmPrinters();

Now to use our target triple to get a Target;

std::string Error;
auto Target = TargetRegistry::lookupTarget(TargetTriple, Error);

// Print an error and exit if we couldn't find the requested target.
// This generally occurs if we've forgotten to initialise the
// TargetRegistry or we have a bogus target triple.
if (!Target) {
  errs() << Error;
  return 1;
}

The TargetMachine class provides a complete machine description of the machine we are targeting. This is the part we select a specific feature or processor.
To view features and processors LLVM recognizes we execute the following command;

$ llvm-as < /dev/null | llc -march=x86 -mattr=help

In this example, we use a generic processor without any features, options or relocation model;

auto CPU = "generic";
auto Features = "";

TargetOptions opt;
auto RM = Optional<Reloc::Model>();
auto TargetMachine = Target->createTargetMachine(TargetTriple, CPU, Features, opt, RM);

To configure the module, we also specify the target and data layout. We do this although not necessary because optimizations work better if they know about the target and data layout;

TheModule->setDataLayout(TargetMachine->createDataLayout());
TheModule->setTargetTriple(TargetTriple);

Object code.

Now to emit object code for the target machine, first we specify the output to a file;

auto Filename = "output.o";
std::error_code EC;
raw_fd_ostream dest(Filename, EC, sys::fs::OF_None);

if (EC) {
  errs() << "Could not open file: " << EC.message();
  return 1;
}

Then we define a pass that emits the object code and execute it;

legacy::PassManager pass;
auto FileType = CGFT_ObjectFile;

if (TargetMachine->addPassesToEmitFile(pass, dest, nullptr, FileType)) {
  errs() << "TargetMachine can't emit a file of this type";
  return 1;
}

pass.run(*TheModule);
dest.flush();

Tying it together.

We compile the code using the following command;

$ clang++ -g -O3 toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs all` -o toy

Then to execute it and specify an average function;

$ ./toy
ready> def average(x y) (x + y) * 0.5;
^D
Wrote output.o

Once done, we press CTRL + D.

We test the object code by writing the following code then linking it with the output;

#include <iostream>

extern "C" {
    double average(double, double);
}

int main() {
    std::cout << "average of 3.0 and 4.0: " << average(3.0, 4.0) << std::endl;
}

Finally, we link our code to output.o object file;

$ clang++ main.cpp output.o -o main
$ ./main
average of 3.0 and 4.0: 3.5

Summary.

Object code refers to instructions that can be understood and executed by a target machine. This code is still far from execution, it needs to be linked and loaded into memory for execution.
In this article, we learned how to compile an LLVM IR into its equivalent object code.

References.

Debugging Kaleidoscope

Sign up for FREE 3 months of Amazon Music. YOU MUST NOT MISS.