Add support of Debugging: DWARF, Functions, Source locations, Variables

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

In this article, we continue to discuss other aspects involved with debugging and debugging information in a Programming Language. We discuss how to include function definitions, source locations, and variable locations in debug information.

Table of contents.

Introduction.
DWARF emission setup.
Function definitions.
Source locations.
Variable locations
Summary.
References.

Prerequisites.

Add debugging support in Programming Language

Introduction.

We will learn how to include debug information, this will make it easier to debug code written in Kaleidoscope.
First, we need to understand what is meant by a compile unit and DWARF.

A compile unit is a top-level container for a section of code in DWARF. It contains the type and function data for an individual translation unit. This means the first thing we need to build is a fib.kbs file.

DWARF is a compact encoding used to represent types, source locations, and variable locations. It will assist us in including debug information in Kaleidoscope.

DWARF emission setup.

LLVM has a DiBuilder class that resembles the IRBuilder previously discussed. It is responsible for the construction of debug metadata given an LLVM IR.
The metadata produced will have a 1:1 correspondence similar to the IRBuilder and the LLVM IR.
In this article, we will use this class to build all IR-level descriptions. Since construction takes in a module, we need to construct it after we have built out the module. In this demonstration, we leave it as a global static variable for simplicity.

We create a small container that is responsible for caching frequent data. First, we write the compile unit, we will also write code for a single type. In this example, we are not concerned about multiple typed expressions.
The following code implements the above discussed;

static DIBuilder *DBuilder;

struct DebugInfo {
  DICompileUnit *TheCU;
  DIType *DblTy;

  DIType *getDoubleTy();
} KSDbgInfo;

DIType *DebugInfo::getDoubleTy() {
  if (DblTy)
    return DblTy;

  DblTy = DBuilder->createBasicType("double", 64, dwarf::DW_ATE_float);
  return DblTy;
}

Inside main we construct our module;

DBuilder = new DIBuilder(*TheModule);

KSDbgInfo.TheCU = DBuilder->createCompileUnit(
dwarf::DW_LANG_C, DBuilder->createFile("fib.ks", "."),
"Kaleidoscope Compiler", 0, "", 0);

While producing a compile unit for Kaleidoscope we used the language constant for C. We do this because the debugger would not understand the calling conventions or default ABI for a language that it does not recognize, we follow the CABI during LLVM code generation since it is closest to the correct thing. This makes sure we can call functions from the debugger and have them executed.
Secondly, fibs.ks in the call to createCompileUnit, is the default hardcoded value since we are using I/O redirection to place source code in the Kaleidoscope compiler. Usually, we would have an input file.

Finally, we finalize debug information, we do this near the end of the main function and dump out the module.

DBuilder->finalize();

Function definitions.

We are done with the compile unit and source locations, the next step is to add function definitions to debug information. For this, in PrototypeAST::codegen() we add the following lines of code to describe the context for our sub-program. In our case File and the actual function definition.
We have the following context;

DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(), KSDbgInfo.TheCU.getDirectory());

The above produces a DIFile and asks the compile unit for the directory and filename where we are currently. We use some source locations and construct a function definition;

DIScope *FContext = Unit;
unsigned LineNo = 0;
unsigned ScopeLine = 0;
DISubprogram *SP = DBuilder->createFunction(
    FContext, P.getName(), StringRef(), Unit, LineNo,
    CreateFunctionType(TheFunction->arg_size(), Unit),
    false /* internal linkage */, true /* definition */, ScopeLine,
    DINode::FlagPrototyped, false);
TheFunction->setSubprogram(SP);

At this point, we have a DISubprogram containing a reference to all function metadata.

Source locations.

An accurate source location is very important for debug information, this is because it makes it possible to map source code back. At this stage, Kaleidoscope does not have any source location information in the lexer or the parser.
We add it as follows;

struct SourceLocation {
  int Line;
  int Col;
};
static SourceLocation CurLoc;
static SourceLocation LexLoc = {1, 0};

static int advance() {
  int LastChar = getchar();

  if (LastChar == '\n' || LastChar == '\r') {
    LexLoc.Line++;
    LexLoc.Col = 0;
  } else
    LexLoc.Col++;
  return LastChar;
}

We have added functionality for tracking the lines and columns of the source file. While we lex every token, we set our current lexical location to the assorted line and column for the start of the token.
For this, we override all previous calls to getchar() using new advance() which tracks information, and then we will have added to all AST classes a source location;

class ExprAST {
  SourceLocation Loc;

  public:
    ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}
    virtual ~ExprAST() {}
    virtual Value* codegen() = 0;
    int getLine() const { return Loc.Line; }
    int getCol() const { return Loc.Col; }
    virtual raw_ostream &dump(raw_ostream &out, int ind) {
      return out << ':' << getLine() << ':' << getCol() << '\n';
    }

Source locations that we pass down when a new expression is created provide us with locations for each expression and variable.

LHS = std::make_unique<BinaryExprAST>(BinLoc, BinOp, std::move(LHS), std::move(RHS));

To ensure every instruction gets a proper source location information, we inform Builder every time we switch locations, we use the following helper function;

void DebugInfo::emitLocation(ExprAST *AST) {
  DIScope *Scope;
  if (LexicalBlocks.empty())
    Scope = TheCU;
  else
    Scope = LexicalBlocks.back();
  Builder.SetCurrentDebugLocation(
      DILocation::get(Scope->getContext(), AST->getLine(), AST->getCol(), Scope));
}

The above function tells the main IRBuilder the position and scope we are currently in. Scope can be at on compile-unit level or the closest enclosing lexical block e.g current function. We represent this by creating a stack of scopes;

std::vector<DIScope *> LexicalBlocks;

We push the scope to the top of the stack when we begin generating code for each function;

KSDbgInfo.LexicalBlocks.push_back(SP);

Also, we should pop the scope back off the scope stack at the end of the code generation for the function;

// Pop off the lexical block for the function since we added it
// unconditionally.
KSDbgInfo.LexicalBlocks.pop_back();

Finally, we emit the location every time we start code generation for a new AST object;

KSDbgInfo.emitLocation(this);

Variable Locations.

We also need to be able to print variables in the current scope. First, we get function arguments set up so we get a decent backtrace and see how functions are being called;

// Record the function arguments in the NamedValues map.
NamedValues.clear();
unsigned ArgIdx = 0;
for (auto &Arg : TheFunction->args()) {
  // Create an alloca for this variable.
  AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, Arg.getName());

  // Create a debug descriptor for the variable.
  DILocalVariable *D = DBuilder->createParameterVariable(
      SP, Arg.getName(), ++ArgIdx, Unit, LineNo, KSDbgInfo.getDoubleTy(),
      true);

  DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(),
                          DILocation::get(SP->getContext(), LineNo, 0, SP),
                          Builder.GetInsertBlock());

  // Store the initial value into the alloca.
  Builder.CreateStore(&Arg, Alloca);

  // Add arguments to the variable symbol table.
  NamedValues[Arg.getName()] = Alloca;
}

Above, we first create a variable giving it the scope SP, name, source location, type, and an argument index because it is an argument.
We then create an lvm.dbg.declare call to indicate at the IR level that we have a variable in an alloca and set a source location for the start of the scope on the declare.

Note that debuggers have assumptions based on how to code and debug information generated for them in the past. In such a case we perform a little hack to avoid generating information for the function prologue so the debugger knows how to skip those instructions when setting a breakpoint. We do this by adding the following line in FunctionAST::CodeGen;

// Unset the location for the prologue emission (leading instructions with no
// location in a function is considered part of the prologue and the debugger
// will run past them when breaking on a function)
KSDbgInfo.emitLocation(nullptr);

Finally, we emit a new location where we start generating code for the function body;

KSDbgInfo.emitLocation(Body.get());

At this point, we have enough debug information to set breakpoints in functions, print argument variables, and call functions.

Summary.

A compile unit is a top-level container for a section of code in DWARF. It contains the type and function data for an individual translation unit. On the other hand, DWARF is a compact encoding used to represent types, source locations, and variable locations.
In this article, we have learned how to include debug information during debugging Kaleidoscope code.

References.

Bootstraping a compiler.