Variable Mutation in Kaleidoscope

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

Mutation in programming languages involves the ability to change an object. In this article, we further extend our language, we add to it the ability to define new variables and mutate them.

Table of contents.

Introduction.
Adjusting existing variables for mutation.
New assignment operator.
User-defined local variables.
Summary.
References.

Prerequisites.

LLVM Memory.

Introduction.

In this article, we will discuss mutable variables in the Kaleidoscope programming language. We will add two features, the first is the ability to mutate variables using the '=' operator, and the second is the ability to define new variables.

An example;

# Define ':' for sequencing: as a low-precedence operator that ignores operands
# and just returns the RHS.
def binary : 1 (x y) y;

# Recursive fibonacci
def fib(x)
  if (x < 3) then
    1
  else
    fib(x-1)+fib(x-2);

# Iterative fibonacci.
def fib(x)
  var a = 1, b = 1, c in
  (for i = 3, i < x in
     c = a + b :
     a = b :
     b = c) :
  b;

# Call it.
fib(10);

For us to mutate the variables, we need to change our existing variables and use the alloca trick, then add our new operator the extend Kaleidoscope to support new variable definitions.

Adjusting existing variables for mutation.

In previous articles, we created a NamedValues map object that represents the symbol table for Kaleidoscope, it is managed during code generation. It holds a double representing the named variable.
To add support for mutation, we need to change it so NamedValues holds the memory location of the variable.
The above changes the code structure but not compiler behavior, it is a code refactoring.
At this stage Kaleidoscope only supports variables for two things, the first is incoming arguments to functions and the second is the induction variable of for loops. For consistency we allow the mutation of these and user-defined variables, meaning they both need memory locations.

To transform Kaleidoscope, we changed NamedValues so it maps to AllocaInst instead of Value*. The compiler will tell us parts of the code we need to update;

static std::map<std::string, AllocaInst*> NamedValues;

We need to create allocas, for this use a helper function that ensures allocas are created in the entry block of the function;

/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
/// the function.  This is used for mutable variables etc.
static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
                                          const std::string &VarName) {
  IRBuilder<> TmpB(&TheFunction->getEntryBlock(),
                 TheFunction->getEntryBlock().begin());
  return TmpB.CreateAlloca(Type::getDoubleTy(TheContext), 0,
                           VarName.c_str());
}

New assignment operator.

Adding a new assignment operator is simple. We parse it just like any other binary operator but handle the parsing internally.
First we set a precedence as follows;

int main() {
  // Install standard binary operators.
  // 1 is lowest precedence.
  BinopPrecedence['='] = 2;
  BinopPrecedence['<'] = 10;
  BinopPrecedence['+'] = 20;
  BinopPrecedence['-'] = 20;

The parser knows the binary operator precedence, it will handle parsing and the generation of an AST. We just have to implement codegen for the assignment operator;

Value *BinaryExprAST::codegen() {
  // Special case '=' because we don't want to emit the LHS as an expression.
  if (Op == '=') {
    // Assignment requires the LHS to be an identifier.
    VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS.get());
    if (!LHSE)
      return LogErrorV("destination of '=' must be a variable");

Unlike other binary operators, our assignment operator does not follow "emit LHS, emit RHS, compute...", therefore it is handled as a special case before other binary operators are handled.
Another strange thing is that it needs LHS to be a variable. Also, it i s invalid to have 'x + 1 = expr', we only have things like 'x = expr';

  // Codegen the RHS.
  Value *Val = RHS->codegen();
  if (!Val)
    return nullptr;

  // Look up the name.
  Value *Variable = NamedValues[LHSE->getName()];
  if (!Variable)
    return LogErrorV("Unknown variable name");

  Builder.CreateStore(Val, Variable);
  return Val;
}
...

We now have the variable. Codegen'ing the assignment involves emitting the RHS of the assignment, creating a store, and returning the computed value.
Returning a value allows for chained assignments such as X = (Y = Z).

We can mutate loop variables and arguments since we have an assignment operator. That is we can execute code looking like the following;

# Function to print a double.
extern printd(x);

# Define ':' for sequencing: as a low-precedence operator that ignores operands
# and just returns the RHS.
def binary : 1 (x y) y;

def test(x)
  printd(x) :
  x = 4 :
  printd(x);

test(123);

The code prints 123, then 4, this shows that we actually mutated the value. We also want the ability to define our own local variables. This is discussed in the next section.

User-defined local variables.

To add user-defined local variables to Kaleidoscope, we follow the same procedures we did, first, we extend the lexer, the parser, the AST, and finally the code generator to incorporate user-defined local variables.

Let's extend the lexer;

enum Token {
  ...
  // var definition
  tok_var = -13
...
}
...
static int gettok() {
...
    if (IdentifierStr == "in")
      return tok_in;
    if (IdentifierStr == "binary")
      return tok_binary;
    if (IdentifierStr == "unary")
      return tok_unary;
    if (IdentifierStr == "var")
      return tok_var;
    return tok_identifier;
...

Then define the AST node that we will construct, for var/in it is as follows;

/// VarExprAST - Expression class for var/in
class VarExprAST : public ExprAST {
  std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;
  std::unique_ptr<ExprAST> Body;

public:
  VarExprAST(std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames,
             std::unique_ptr<ExprAST> Body)
    : VarNames(std::move(VarNames)), Body(std::move(Body)) {}

  Value *codegen() override;
};

Above, var/in allows a list of names to be defined at once, each optionally having an initializer value, therefore we store this information in a vector - VarNames.
Also, var/in has a body that is allowed to access variables defined by var/in.

Now, let's extend the parser;

/// primary
///   ::= identifierexpr
///   ::= numberexpr
///   ::= parenexpr
///   ::= ifexpr
///   ::= forexpr
///   ::= varexpr
static std::unique_ptr<ExprAST> ParsePrimary() {
  switch (CurTok) {
  default:
    return LogError("unknown token when expecting an expression");
  case tok_identifier:
    return ParseIdentifierExpr();
  case tok_number:
    return ParseNumberExpr();
  case '(':
    return ParseParenExpr();
  case tok_if:
    return ParseIfExpr();
  case tok_for:
    return ParseForExpr();
  case tok_var:
    return ParseVarExpr();
  }
}

First we add it as a primary expression. Then define ParseVarExpr;

/// varexpr ::= 'var' identifier ('=' expression)?
//                    (',' identifier ('=' expression)?)* 'in' expression
static std::unique_ptr<ExprAST> ParseVarExpr() {
  getNextToken();  // eat the var.

  std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;

  // At least one variable name is required.
  if (CurTok != tok_identifier)
    return LogError("expected identifier after var");

Above, we parse a list of identifier/expr pairs into local vector - VarNames;

We then parse the body and create the AST node once all the variables have been parsed;

  // At this point, we have to have 'in'.
  if (CurTok != tok_in)
    return LogError("expected 'in' keyword after 'var'");
  getNextToken();  // eat 'in'.

  auto Body = ParseExpression();
  if (!Body)
    return nullptr;

  return std::make_unique<VarExprAST>(std::move(VarNames),
                                       std::move(Body));
}

To support emission of an LLVM IR we use the following code;

Value *VarExprAST::codegen() {
  std::vector<AllocaInst *> OldBindings;

  Function *TheFunction = Builder.GetInsertBlock()->getParent();

  // Register all variables and emit their initializer.
  for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
    const std::string &VarName = VarNames[i].first;
    ExprAST *Init = VarNames[i].second.get();

The above code loops over all variables, installing them one at a time. For each, we place it in a symbol table, remember the previous value, and replace it in OldBindings.

We then emit the initializer, create the alloca and update the symbol table to point to it.

  // Emit the initializer before adding the variable to scope, this prevents
  // the initializer from referencing the variable itself, and permits stuff
  // like this:
  //  var a = 1 in
  //    var a = a in ...   # refers to outer 'a'.
  Value *InitVal;
  if (Init) {
    InitVal = Init->codegen();
    if (!InitVal)
      return nullptr;
  } else { // If not specified, use 0.0.
    InitVal = ConstantFP::get(TheContext, APFloat(0.0));
  }

  AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
  Builder.CreateStore(InitVal, Alloca);

  // Remember the old variable binding so that we can restore the binding when
  // we unrecurse.
  OldBindings.push_back(NamedValues[VarName]);

  // Remember this binding.
  NamedValues[VarName] = Alloca;
}

Once all variables are added to the symbol table, we evaluate the body of var/in;

// Codegen the body, now that all vars are in scope.
Value *BodyVal = Body->codegen();
if (!BodyVal)
  return nullptr;

Finally we restore the previous variable bindings before returning;

  // Pop all our variables from scope.
  for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
    NamedValues[VarNames[i].first] = OldBindings[i];

  // Return the body computation.
  return BodyVal;
}

The result is properly scoped variable definitions that are mutable.

Summary.

mem2reg pass optimizes our stack variables into SSA registers, inserting PHI nodes where needed and the compiler front-end remains simple
In this article, we have added to Kaleidoscope, the ability to define new variables and mutate them. We use the '=' operator to mutate variables.

References.

Compilation to Object Code