If-then-else in LLVM Control Flow

Do not miss this exclusive book on Binary Tree Problems. Get it now for free.

Control flow statements are statements used to change the flow of execution of the program. In this article, we extend Kaleidoscope to include control flow operations such as if-then-else statements.

Table of contents.

  1. Introduction.
  2. If-Then-Else.
  3. Lexer extensions.
  4. AST extensions.
  5. Parser extensions.
  6. LLVM IR extensions.
  7. Code generation extensions.
  8. Summary.
  9. References.

Prerequisites.

  1. Implementing JIT(Just In Time) Compilation.

Introduction.

In programming languages, control flow statements are used to change the flow of execution of a program. They include if-then-else statements and loops such as for, while and do..while.

In the prerequisite article, we saw how to generate an LLVM IR, further optimize the code, and construct a JIT compiler for the programming language. In this article, we extend Kaleidoscope to include if-then-else conditional statements.

If-Then-Else.

As we have seen in previous articles, to implement a new feature into our programming language, we need to add support for it in the lexer, parser, AST, and LLVM IR, in this case, we will be adding support for if-then-else conditional statements. This shows how easy it is to develop a language and grow it by adding additional features incrementally as new ideas are discovered.

Our end result will allow us to write such if-then-else statements in Kaleidoscope;

def fib(x)
  if x < 3 then
    1
  else
    fib(x-1)+fib(x-2);

Since everything in Kaleidoscope is an expression, the if-then-else expressions also need to return a value. If it evaluates an expression to false, 0 is returned, otherwise, the expression evaluates to true. Also, if the condition is true, the first sub-expression is evaluated and then returned otherwise, if the condition is false, the second sub-expression is evaluated and then returned.

Lexer extensions.

In this section, we will add support for if-then-else statements in the lexer, the very first stage of compilation. Here we need to specify more keywords so they can be converted into their corresponding tokens.

First, we add new enum values for relevant tokens as follows;

// control
tok_if = -6,
tok_then = -7,
tok_else = -8,

Then use the following code to identify them;

...
if (IdentifierStr == "def")
  return tok_def;
if (IdentifierStr == "extern")
  return tok_extern;
if (IdentifierStr == "if")
  return tok_if;
if (IdentifierStr == "then")
  return tok_then;
if (IdentifierStr == "else")
  return tok_else;
return tok_identifier;

AST extensions.

We also need to represent them as AST nodes, for this we create a new node as follows;

/// IfExprAST - Expression class for if/then/else.
class IfExprAST : public ExprAST {
  std::unique_ptr<ExprAST> Cond, Then, Else;

public:
  IfExprAST(std::unique_ptr<ExprAST> Cond, std::unique_ptr<ExprAST> Then,
            std::unique_ptr<ExprAST> Else)
    : Cond(std::move(Cond)), Then(std::move(Then)), Else(std::move(Else)) {}

  Value *codegen() override;
};

As we can see above, an AST node consists of pointers pointing to various sub-expressions.

Parser extensions.

The parser's input is a string of tokens from the lexer, it verifies that the tokens can be generated by a grammar of Kaleidoscope using the AST. The parser will also be responsible for reporting any syntax errors found in the source code in regards to if-then-else statements in this case. In well-formed programs, the parser constructs a parse tree and returns it as its output which is passed on to the following stages of compilation.

In our case, the parsing logic is as follows, first we define a new parsing function as follows;

/// ifexpr ::= 'if' expression 'then' expression 'else' expression
static std::unique_ptr<ExprAST> ParseIfExpr() {
  getNextToken();  // eat the if.

  // condition.
  auto Cond = ParseExpression();
  if (!Cond)
    return nullptr;

  if (CurTok != tok_then)
    return LogError("expected then");
  getNextToken();  // eat the then

  auto Then = ParseExpression();
  if (!Then)
    return nullptr;

  if (CurTok != tok_else)
    return LogError("expected else");

  getNextToken();

  auto Else = ParseExpression();
  if (!Else)
    return nullptr;

  return std::make_unique<IfExprAST>(std::move(Cond), std::move(Then),
                                      std::move(Else));
}

ParseIfExpr parses input given to it, it also reports errors when found in the code.
The next step is to hook up the above function as a primary expression;

static std::unique_ptr<ExprAST> ParsePrimary() {
  switch (CurTok) {
  default:
    return LogError("unknown token when expecting an expression");
  case tok_identifier:
    return ParseIdentifierExpr();
  case tok_number:
    return ParseNumberExpr();
  case '(':
    return ParseParenExpr();
  case tok_if:
    return ParseIfExpr();
  }
}

LLVM IR extensions.

At this stage, we are now ready to generate Intermediate Code related to if-then-else statements. First, we consider the following;

extern foo();
extern bar();
def baz(x) if x then foo() else bar();

We use the above to motivate the IR we want to produce. Without any kind of optimizations, we have the following;

declare double @foo()

declare double @bar()

define double @baz(double %x) {
entry:
  %ifcond = fcmp one double %x, 0.000000e+00
  br i1 %ifcond, label %then, label %else

then:       ; preds = %entry
  %calltmp = call double @foo()
  br label %ifcont

else:       ; preds = %entry
  %calltmp1 = call double @bar()
  br label %ifcont

ifcont:     ; preds = %else, %then
  %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
  ret double %iftmp
}

The visualization of the above is shown below;

We can obtain this using LLVM's opt tool. First, we compile the code to an LLVM IR with the .ll file extension. Then execute the command; llvm-as < t.ll | opt -passes=view-cfg. The result is the above image. This is one of many LLVM features for graph visualization.

The entry block evaluates the conditional expression x, then compares the result to 0.0 using the fcmp one instruction. The results determine if the code jumps to either then or else blocks, which contain expressions for true/false cases.
When the if/then blocks are completed, they branch back to the ifcont block to execute code that comes after if/then/else statements.

The remaining part is to return to the caller. How the code knows which expression to return is dependent on an important SSA operation - the phi operation.

Execution of the phi operation requires remembering the block origin. It takes on the value that corresponds to the input control block. If control comes from the then block, it gets the calltmp value otherwise if it comes from the else blocks it is assigned the calltmp1 value.

Code generation extensions.

Finally, the last step that remains to incorporate if-then-else conditional statements into our language lies in code generation. We implement a codegen method for IfExprAST that emits an expression for the condition. We then compare the value emitted yo zero to obtain a truth value as a 1-bit boolen value. We code it as follows;

Value *IfExprAST::codegen(){
    Value *CondV = Cond->codegen();
    if (!CondV)
    return nullptr;

    // Convert condition to a bool by comparing non-equal to 0.0.
    CondV = Builder.CreateFCmpONE(CondV, ConstantFP::get(TheContext, APFloat(0.0)), "ifcond");

Next, we create basic blocks related to if/then/else statements;

    Function *TheFunction = Builder.GetInsertBlock()->getParent();

    // Create blocks for the then and else cases.  Insert the 'then' block at the
    // end of the function.
    BasicBlock *ThenBB = BasicBlock::Create(TheContext, "then", TheFunction);
    BasicBlock *ElseBB = BasicBlock::Create(TheContext, "else");
    BasicBlock *MergeBB = BasicBlock::Create(TheContext, "ifcont");

    Builder.CreateCondBr(CondV, ThenBB, ElseBB);

First, we get the current Function object being built, then create three blocks emitting the conditional branch that chooses between them.

Creating blocks does not impact the IRBuilder therefore it is still inserted into the block that the condition went into.

After the conditional branch is inserted, we move the builder to start inserting into the then block. This moves the insertion point to the end of a block.

Builder.SetInsertPoint(ThenBB); // emit then value

Value *ThenV = Then->codegen();

if (!ThenV)
    return nullptr;

Builder.CreateBr(MergeBB);

// codegen for 'Then' can change the current block, and update ThenBB for the PHI.
ThenBB = Builder.GetInsertBlock();

When the insertion point has been set, we codegen the then expression from the AST. We use recursion for this. To complete the then block, we create an unconditional branch to the merge block. In LLVM, all control flow, including fall through must be made explicit in the LLVM IR.

Code generation for the else block is similar to the then block;

TheFunction->getBasicBlockList().push_back(ElseBB); // emit else block
Builder.SetInsertPoint(ElseBB);

Value *ElseV = Else->codegen();
if (!ElseV)
    return nullptr;

Builder.CreateBr(MergeBB);

// codegen of 'Else' can change the current block, update ElseBB for the PHI.
ElseBB = Builder.GetInsertBlock();

The difference is in the first line which adds the else block to the function.

Finally, we complete the merge code;

    TheFunction->getBasicBlockList().push_back(MergeBB); // emit merge block
    
    Builder.SetInsertPoint(MergeBB);
    
    PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(TheContext), 2, "iftmp");

    PN->addIncoming(ThenV, ThenBB);
    PN->addIncoming(ElseV, ElseBB);
    
    return PN;
}

The first line adds the merge block to the Function object.
The second change the insertion point so the newly created code will go into the merge block.
In the third line, we create the PHI node and set up the block/value pairs for the PHI

In the end, the CodeGen function returns the phi node as the computed value by the if/then/else expression, this value is fed into the top-level function code which creates the return instruction.

Now we can execute conditional code in this programming language.

Summary.

Control flow statements are statements used to change the flow of execution of the program. We implemented control-flow support for ** by supporting it in the lexer, parser, AST, and finally the LLVM code generator.

References.

  1. Control flow in code generation
  2. LLVM Control Flow: for loops.

Sign up for FREE 3 months of Amazon Music. YOU MUST NOT MISS.