Compiling logic programs

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

In this article, we discuss the general compilation technique for a full recursive backtracking search over clauses combined with a full unification of terms.

Table of contents.

Introduction.
List procedures.
Assert and retract predicates.
The 'cut' operator.
Optimizing clause selection.
Compiled clause search and unification.
Summary.
References.

Prerequisite

The logic programming model

Introduction

Even though the compilation of a full backtracking search over clauses and of unification of terms require substantial implementation effort, modern logic program compilers achieve speeds corresponding to imperative language compilers by doing extensive optimizations consisting of specializations of more general code which can be understood on the basis of that code.

In this article we discuss a generalized approach but a more specialized approach can be derived from the same.

Problems.

In most imperative languages, local data is not preserved for routines after they finish therefore it is impossible for them to backtrack upon failure to the last position where the last choice was made.
Clauses will correspond to routine definitions while goals will correspond to routine calls in imperative languages. Particularly, all clauses with the same relation name will correspond to a single routine which is called when the goal of the name is to be unified and substituted. By including the number of terms of this relation in the routine name, C's identifier identification mechanism will restrict unification attempts with same name same number of terms.

Lets discuss how we solve these problems.

List procedures.

List procedures are a part of a nameless programming paradigm whereby composite data items are represented as a one-parameter routine that applies its parameter(an action routine) to each component of the data item.

An Example

Functions return values so that their callers can use the returned values for other actions, therefore if we return a value v so that a function can perform an action A(v) we can say that the function A(v) rather than returning v could just call v directly.

This will allow it to return more than one value.

Instead of writing(impossible in C)

int returnTwoValues(void){
    return a then return b;
}

//calling
A(returnTwoValues());

we can write(possible in C and other imperative languages)

void twoValues(*Act_on)(int)){
    (*Act_on)(a);
    (*Act_on)(b);
}

//calling
twoValues(A);

The function twoValues() is a list procedure because it represents a list [a, b], and when it is called it performs actions A(a) and A(b).

A list procedure.

void numberPairList(void (*Action)(int v1, int v2){
    (*Action)(1, 3);
    (*Action)(4, 6);
    (*Action)(7, 3);
}

numberPairList() represents the list of pairs [(4, 9), (10, 2), (7, 2)].
We also use the above routine to define another routine to find and print all pairs in numberPairList whose sum is the parameter sum.

Applying the list procedure.

void findSum(int sum){
    void testSum(int v1, int v2){
        if(v1 + v2 == x){
            printf("Found %d, %d \n", v1, v2);
        }
    }
    numberPairList(testSum);
}

We keep the scope of names as small as possible by keeping testSum() local and therefore each findSum()-like routine will have its own action routine testSum() while avoiding name clashes. This allows full recursion.

findSum(13) lets numberPairList() call testSum() for each integer pair which is represented by numberPairList().
testSum() tests the required property from the pairs and the output is the following.

Output

Found 4, 9

This code uses a nested routine which fortunately GNU C compiler has supporting extensions for this.

Compiled clause search and unification.

First we translate each clause for a given relation to a routine.
Then create a single list procedure for the relation.
Finally call each clause routine in succession using the list procedure.
In these calls, both the list procedure and clause translation will get a special action routine parameter.

The action routine is called for each successful unification of the goal and one of the clauses for the relation.
Use of list procedures solves the problem of routines returning multiple values and the problem for duplicate goal lists.

An example
Given the relation parent/2
parent(john doe)
parent(one two)
parent(jane doe)
parent(three, four)

We have the translation;

void parent_2(Term *goal_arg1, Term *goal_arg2, Action goal_list_tail){
    parent_2_clause_1(); //parent(john doe)
    parent_2_clause_2(); //parent(one two)
    parent_2_clause_3(); //parent(jane doe)
    parent_2_clause_4(); //parent(three, four)
}

The 4 clause routines are local to and nested inside parent_2 routine which calls them to establish their logical OR.

The clause routines can therefore access parent_2 parameters as non-local variables and when they succeed, they call the action routine goal_list_tail.

The generated clause routine for parent(john, doe)

//translation of 'parent(john, doe).'
void parent_2_clause_2(void){
    //translation of 'john, doe).'
    void parent_2_clause_2_arg_1(void){
       //translation of 'doe).'
        void parent_2_clause_2_arg_2(void){
           //translation of '.'
           void parent_2_clause_2_body(void){
               goal_list_tail();
           }
           //translation of head term 'doe)'
           unify_terms(goal_arg2, put_constant("doe"), parent_2_clause_2_body);
        }
        //translation of head term 'john,'
        unify_terms(goal_arg1, put_constant("john"), parent_2_clause_2_args_2);
    }
    //translation of 'parent('
    parent_2_clause_2_arg_1();
}

Given the query ?-parent(john, X) which results in the goal list
parent(john, X), << ("X", X)

We expand the first goal parent(john, X) in the above goal list by issuing the call
parent_2("john", X, goal_list_tail) whereby the action routine goal_list_tail() corresponds to <<("X", X).
parent_2() calls each clause for parent/2 and clause calls goal_list_tail when it succeeds.
The action routine is convenient as it avoids copying the goal list tail.

reading from the bottom

Parent_2_clause_2() calls nested routine parent_2_clause_2_arg_1() for the first argument which attempts to unify the first argument goal_arg1 with its first head argument john by calling unify_terms().

parent_clause_2_arg_1 also passes the routine parent_2_clause_2_arg_2() as the action routine to unify_terms() understanding that unify_terms will call its own action routine when unification succeeds.

Similarly, parent_2_clause_2_arg_2() nested in parent_2_clause_2_arg_1() tries to unify the second argument goal_arg2 with "doe" head argument.

Relationship between prolog and C

A comma as a separator after head term will translate into an action routine name ending in _arg_N.
A closing parenthesis translates into an action routine name ending in _body.
A dot translates into a call of goal_list_tail().

full backtracking variable unification in C

void unify_unbound_variable(Term *goal, Term *head_arg, Action goal_list_tail){
    if(goal_arg == head_arg){
        //unification of identical variables succeeds
        goal_list_tail();
    }else{
        //bind unbound variable to other term
        if(head_arg->type == Is_Variable){
            head_arg->term.variable.term = goal_arg;
            goal_list_tail();
            head_arg->term.variable.term = 0;
        }else{
            goal_arg->term.variable.term = head_arg;
            goal_list_tail();
            goal_arg->term.variable.term = 0;
        }
    }
}

Since searching routines don't return until all results have been obtained, they can undo variable bindings on the way back therefore there is no need to record bindings by calling trail_bindings()

From the code we can see how the first variable is bound by setting variable.term then the search continues with this binding then undone by setting variable.term to 0.

For the query ?−parent(john, X) we have the code.

void query(void){
    Term *X = put_variable("X");
    void display_X(void){
        print_term(X);
        printf("\n");
    }
    parent_2(put_constant("john"), X, display_X);
}

Because queries are entered dynamically as we shall come to see in the coming sections, how the above code is compiled on the fly.

We shall also touch on how the input processing section of the compiled code performs the top level of the query using an interpreter for optimization purposes.

Output

X = doe;

Optimizing clause selection.

From the translation code for the relation parent/2 calls routines for all four clauses regardless of the values of the arguments goal_arg1 and goal_arg2 however a certain call is doomed to fail or that the entire parent_2 call will fail.

It will fail when one of the parameters is not a constant or an unbound variable.
We can optimize this use of the switch_on_term WAM instruction to jump to the label of the code which that handles terms of the specified type for the given routine.

Another case is when one of the parameters is a constant and we compare it with the corresponding constant in each clause, even if they don't match we call the routine even though it does nothing.

We optimize this by the use of switch instructions which will direct the flow of control to specific pieces of code appropriate for the specified argument.

Other WAM instructions for optimizing clause selection are try_me_else, retry_me_else and trust_me_else used for the first, middle and last calls of the clause routines.

You can learn more on this in the references provided in the references section.

The 'cut' operator.

In logic programming search space can be reduced by pruning.

Pruning uses operators such as cut, commit, once, conditionals and variations on these to reduce the search space.
Prolog allows reduction of search space by use of cuts which tell the search not to backtrack past the introduction of the cut as a goal.

The cut operator has no arguments and always succeeds, it is expressed as an exclamation mark ! and has the property that is if it succeeds in a clause C for relation R it commits the relation R to the search space described by the part after the cut in clause C and ignores the rest of the search space of C and the search spaces of other clauses for R.

Once the search has passed a cut in a clause for a relation R, backtracking over the cut results in further processing of goal R being skipped.
This way, we can prevent unwanted backtracking by giving prolog orders to freeze the decisions made so far in a predicate and thus save time while improving speed, performance and memory usage.

Assert and retract predicates.

In prolog, adding and removing clauses during program execution is possible by use of extra-logical predicates which don't express relations.

Assert(Clause) dynamically adds the term Clause as a clause to the program.
Since there are no restrictions on the relation expressed by this clause, the relation may be program-defined, already occurring statically in the program, known from assert calls or a new one.

Retract(Clause) dynamically removes the clause Clause from the program.
There are no restrictions on types of clauses that may be retracted therefore it can retract program-defined clauses.

To include these additions and subtractions to compiled code and activate new clauses without compromising the efficiency of program defined clauses
we use a hybrid technique where by we switch to an interpreter for the asserted clause but switch back to compiled routines as soon as possible.

For this we keep the asserted clauses in symbols augmented with a mechanism that allows fast access to the asserted clauses for a given relation which will make assert and retract operations simple.

To activate newly added clauses we consider the case that the asserted clause refers to a program-defined relation and assume that the asserted clauses are to be activated after the program-defined clauses.
In prolog we use assertz() .

For the case where we add clauses to be activated before the program-defined clauses we use asserta()

Summary

Prolog unification cannot succeed in more than one way if a step in the unification fails and since no backtracking inside the unification is necessary, routines and their calling sequences can be made more efficient.

The first logic language processors were interpreters.

Some logic language compilers generate C code for simplicity.
An expression of full backtracking has nested procedures which can be handled by a supporting GNU C compiler.

References.

Introduction to Prolog
Logic programming courseware
Modern Compiler Design 2nd edition Dick Grune Kees van Reeuwijk Henri E. Bal Ceriel J.H. Jacobs Koen Langendoen