Stack allocation in Compilers

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

Stack allocation is a runtime storage management mechanism for the compiler whereby activation records are pushed and popped onto thee stack as activations begin and end by use of predefined routines in the compiler.

Table of contents.

Introduction.
Activation trees and records.
Allocation of variable-length data.
Calling sequences.
Summary.
References.

Prerequisites

Introduction.

Compiling languages using procedures, functions or methods as units of user-defined actions require the use of a stack for managing their run-time memory.
When a procedure, function or method is called, space for its local variables is pushed onto the stack.

Activation trees and records.

A program can be defined as a sequence of instructions which are combined into a number of procedures.
The procedure will have a start and and end delimiter and everything inside the procedure is referred to as the body of the procedure.
Instructions in a procedure execute sequentially.
The execution of a procedure is referred to as activation and when activation occurs an activation record is created.

Trees.

An activation record will contain all information necessary to call the procedure.
Information in an Activation Record
Local data - stores temporary and intermediate values of an expression, machine status - stores information such as registers, program counter statuses before the procedure is called.
Control link - stores the address of an activation record of the calling procedure.
Access link - stores information about data which is outside the local scope
Parameters - stores parameters such as input parameters for a calling procedure, these are stored in registers whenever possible for efficiency,
Return value - stores return values for procedures which return, note that, register storage is preferable for its efficiency.

A control stack stores the activation record for an executed procedure.
When a procedure executing calls another procedure, its execution is halted until the called procedure finishes its execution. The called procedure activation record is also stored in the stack.

Program control flow is sequential and when a procedure is called, its control is transfered to the called procedure which when executed, returns control back to the caller.

An example (quickSort)

int arr[11];

//read 9 integers into arr[1],...,arr[9]

void read(){
    int i;
    ...
}
/* partitioning procedure - pick a pivot element v and partition arr[m,...,n] such that arr[m,...,p-1] are less that v, arr[p] = v and arr[p+1,...,n] are equal or greater than v. */

int partition(int m, int n){
    return p;
}

void quickSort(int m, int n){
    int i;
    if(n > m){
        i = partition(m, n);
        quickSort(m, i-1);
        quickSort(i+1, n);
    }
}

main(){
    read();
    arr[0] = -9999;
    arr[10] = 9999;
    quickSort(1, 9);
}

The above program reads nine integers into an array and sorts them using a recursive quickSort algorithm.
The main function;

Calls the read().
Sets sentinels.
Calls the quick sort algorithm on the array.

Activations for the program.

enter main()
    enter read()
    leave read()
    enter quickSort(1, 9)
        enter partition(1, 9)
        leave partition(1, 9)
        enter quickSort(1, 3)
        ...
        leave quickSprt(1, 3)
        enter quickSort(5, 9)
        ...
        leave quickSort(5, 9)
    leave quickSort(1, 9)
leave main()

The call to the partitioning procedure partition(1,9) returns 4, therefore arr[1] - arr[3] will hold all elements less than 4 and arr[5] to arr[9] hold elements larger than pivot 4.

Procedure activations are nested in time, in that, if an activation of procedure p calls procedure q then the activation of q must terminate before the activation of procedure p can terminate.
Common cases

When the activation of q terminates normally, control resumes after the point of p at which the call was made to q.
An activation of q or procedure q called either directly or indirectly will abort because it becomes impossible to execute since p ends simultaneously with q.
Activation of q terminates if q cannot handle an exception. i.e, if procedure p handles the exception, the activation of q terminates while activation of p continues not necessarily from the point at which a call to q was made.
If p cannot handle the exception, its activation terminates at the same time as q's activation, and we presume that the exception will be handled by some other open activation procedure.

This control flow enables us to be able to represent a series of activations in the form of a tree - activation tree.

stackAlloc1

Each node represents one activation.
The root is the activation of the main procedure which initiates execution of the program.
Children of nodes represent the activations called by their parent nodes.
Activations are in order from left to right.
A child must finish before the right activation starts.

Relationship between activation tree and program behavior

Sequence of procedure calls correspond to a preorder traversal of the activation tree.
Sequence of returns correspond to the postorder activation tree traversal.
If control lies in an activation of a procedure for node N, the activation is live for all corresponding nodes including its ancestors. These activations were called in the order they appear along the path to N starting at the root, they return in reverse order.

Records.

The control stack manages procedure calls and returns.
Each live activation will have an activation record/frame on the control stack.
The activation tree root is at the bottom of the stack, then all sequences of activation records on the stack corresponding to the path in the tree where the control is currently at, have their records at the top of the stack.

Considering the activation tree image in the previous section we give the following example,
If control is in activation q(2,3) then the activation record for the same is at the top of the control stack, below that, the activation record for q(1,3), below that, the activation record for q(1,9) and finally at the bottom we have the activation record for m - the root of the tree representing the main function.

Note: For control stack images, activation records elements appearing lowest on the page are closest to the top of the stack.

An example of a general activation record

stackAlloc2

Temporaries - are values arising from the evaluation of an expression which cannot be held in registers.
Local data - data belonging to a procedure of which the activation record belongs to.
The rest have been described in the previous section.

A run-time stack as control flows through the activation tree(quickSort)

stackAlloc3

Dashed lines represent terminated activations.
(a). Array a is global therefore it is allocated space before execution begins with an activation procedure main.
(b). When control arrives at the first call in the main body, procedure r is activated and its activation record pushed to the stack.
Activation record for r will contain space for local variable i.
When control returns from the r's activation, its record is popped off the stack and main's record is left.
(c). Control is now at q with actual parameters 1, 9, this activation record is placed onto the stack. This record will contain space for parameters m, n and local variable i. Space used by the r call is reused on the stack.
Data local to r will not be available to q(1,9).
When q(1,9) returns, the stack will have the activation record for main again.
(d). Activations p(1,3) and q(1,0) begins and end during the lifetime of q(1,3) which leaves the record q(1,3) on top.

Allocation of variable-length data.

An example of variable-length data is variable-length arrays whose size depends on the value of one or more parameters of the called procedure.
In this section we discuss how to allocate objects and other structures whose sizes are not known on the stack.
The reason for this is to avoid the expense of garbage collecting this space.

Strategy for allocating variable-length arrays or any variable-length structures local to a procedure.

stackAlloc5

Procedure p has three local arrays whose sizes cannot be determined at compile time.
Storage for these arrays is not part of p's AR even though it appears on the stack.
Only a pointer to the beginning of each array appears in the AR and therefore during p's execution these pointers are known offsets from the top-of-stack pointer and the target code is able to access array elements through these pointers(top and top_sp).
top marks the top of the stack(beginning of the next AR).
top_sp finds local fixed-length fields of the top activation record.

The code that repositions these two pointers is generated at compile time.
When q returns, top_sp can be restored from the saved control link in the AR for q.

The new value of top is top_sp minus length of the machine status, control and access links, return value and parameter fields.
This length is known to the callee at compile time although it may depend on the caller is the number of parameters can vary across calls to q.

Calling sequences.

Calling sequences consists of code that allocates an activation record on the stack and enters information in its fields.
They are used to implement procedure calls.
A return sequence restores the state of the machine so that a calling sequence can continue its execution after the call.
Code in a calling sequence is divided between the caller(calling procedure) and the callee(procedure being called).
The source language, target machine and OS may impose requirements that may favor execution of run-time task by either the caller or the callee and as such there is no clear division of tasks between the two.
That is, if a procedure is called from n points then the portion of the calling sequence that is assigned to the caller is generated n times while the portion for the callee sequence is only generated once.
It is desirable to put much of the calling sequence into the callee.

The division of tasks between caller and callee

stackAlloc4

The register top_sp points to the end of the machine-status field in the current top AR and since this position is known to the caller, the caller is responsible for setting top_sp before control is passed to the callee.

The calling sequence and division(caller - callee)

Caller evaluates parameters.
Caller stores return address and old value of top_sp into callee's AR and increments top_sp(moving it past caller's local data and temporaries and callee's parameters and status fields).
Callee saves register values and other status information.
Callee initializes its local data and execution begins.

The corresponding return sequence

Callee places return value next to parameters.
Callee restores top_sp and other registers then branches to the return address caller placed in the status field, all this using information in machine status field.
The top_sp has been decremented but caller will know the location of the return value which is relative to the current value of top_sp.

Principles to consider when designing calling sequences

Values communicated between caller and callee are placed at the beginning of the callee's activation record so that they are as close to the caller AR as possible.
This is so that the caller can compute the values of the actual parameters of the call and place them on top of its own AR so as not to create an new AR for the callee.
This allows for use of procedures with a varying number of arguments.
Fixed-length items(control links, access link, machine status) are to be placed in the middle.
The reasoning is that, if the same components for the machine status are saved for each call, the same code can do the saving and restoring for each.
Also this standard allows other programs e.g debuggers to easily decipher stack contents if an error occurs.
Items whose size may not be known before execution e.g a dynamic array, are placed at the end of the AR, moreover, the amount of space needed for temporaries depends on the success of the code-generation phase in keeping temporaries in the register.
Therefore while the space needed for temporaries is eventually known to the compiler, it might not be known when the intermediate code is first generated.
We judiciously locate the top-of-stack pointer by having it point to the end of the fixed-length fields in the AR.
Fixed-length data can be accessed by fixed offsets known to the intermediate code generator relative to the top-of-stack pointer.
A result of this approach is that variable-length fields in the activation record are actually above the top-of-stack. Their offsets need to be calculated at run time but they too can be accessed from the top-of-stack pointer by use of a positive offset.

Summary.

Storage is organized as a stack also called a control stack.
Activation records are pushed onto the stack as activation begins and popped when they end.

Stack memory allocation is safer compared to heap memory allocation, it is also faster than the latter.
A stack overflow is an error generated when the stack is full.

References.

Compilers, principles, techniques and tools, 2nd edition Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman.
Basics of Compiler Design Torben Ægidius Mogensen