FOIL - An Overview

1.0 Introduction

This reference provides an overview of the learning algorithm, 
FOIL (version 5.1). The program learns Horn clause definitions of 
target relations from both positive and negative examples. Each
learned Horn clause is defined in terms of the target and other relations.


2.0 Description of the approach

The operation of FOIL can be summarized as :

FOIL:
* Binary symmetric relation check.
* Constant ordering for types (see section 6).

   FIND A RELATION:
   * Establish the training set consisting of positive and negative constant 
    tuples.

   * While (there are positive tuples in the training set) OR 
  	    (stopping criterion [1] not satisfied)
     - Find a clause that characterizes part of the target relation.

	GROW A CLAUSE:
	* Initialize the local training set T1 to the training set
	  and let i=1.
	* While (Ti contains negative tuples) OR 
	  	(stopping criterion [2] not satisfied)
	    - Find a literal Li to add to the right-hand side of the clause
	    - Produce a new training set Ti+1 based on those tuples
	      in Ti that satisfy Li. If Li introduces new variables, each
	      such tuple from Ti may give rise to several (expanded) tuples
	      in Ti+1. The label of each tuple in Ti+1 is the same as that
	      of the parent tuple in Ti.
	    - Increment i and continue.

	* Prune redundant literals from the clause 

     - Remove all positive tuples that satisfy the right-hand side of this 
       clause from the training set.

   * Prune clauses in the Horn clause definition.
   * Reorder clauses such that non-recursive clauses precede recursive ones.
   * Test the final Horn clause definition with the (optional) test examples.


FOIL allows several target relations to be specified in the input file.
FIND A RELATION procedure learns a target relation in the form of Horn
clause definition at each call. Every such call starts
by establishing the training set which consists of positive
and negative constant tuples. The input to the program may contain only
positive constant tuples. In this case, FOIL will generate all the other
constant tuples that obey the type constraint of the relation
and regard them as negative constant tuples. Refer to "foil.manual" for
a description of input format and various options available.


2.1 GROW A CLAUSE

Consider first the inner procedure of producing a single clause at each
call. GROW A CLAUSE procedure "grows" a Horn clause of the form:

	P(X1,X2,....,Xk) <-  L1,L2, ....,Ln

where P(X1,X2,....,Xk) is a k-ary predicate of the target relation, and 
literals L1,L2, ...,Ln are predicates of input relations in the training 
set or equality predicates of the form X1=X2, or their negations. The
procedure begins by initializing the local training set to the input training
set. The search for literals on the right-hand side of the clause starts with 
just the predicate of the target relation on the left-hand side. Each
iteration in "while-loop" of the procedure finds the best literal, Li to
attach to the right-hand side. FOIL employs an information-gain estimate to
select the best literal and it is defined as:

        c * ( -log2(p/(p+n)) +log2(p'/(p'+n')) )

where p and n are number of positive and negative tuples covered by a
        partial clause, C,
      p' and n' are number of positive and negative tuples covered by
        new partial clause, C' after adding a new literal.
      c is the number of positive tuples in C covered by C'.

Take note that c <= p' as p' is the expanded positive tuples of c tuples, 
if new variable is introduced in the new literal. c=p' when the new literal
has no new variables or determinate literals are added to the right-hand
side of the clause. Determinate literals are described in section 5.

 
Each literal Li in the right-hand side of a clause takes one of the six
forms:

	Xj = K,
	Xj <> K,
	Xj = Xk,
	Xj <> Xk,
	Q(V1,V2,...Vr),
	~Q(V1,V2,...Vr)

where Xi's are the existing variables and K is a constant of variable Xj.
      Q's are the input relations and ~Q's are negated Q relations.
      Vi's are existing or new variables.

While searching for a literal to add to the right-hand side of the clause,
FOIL investigates the entire space of literals, with three
significant qualifications:

    * The literal must contain at least one existing variable.
    * If the literal is the same as the relation on the left-hand side
        of the clause, possible arguments are restricted to prevent some
        problematic recursion.
    * The search space is reduced as result of information-gain heuristic.
	Priority is given to investigating literals Q(V1,V2,...Vr) that
	contain many new variables. This is due to fact that substituting
	an existing variable for a new variable in the relation Q will only
	have maximum gain when the new partial clause does not cover any
	negative tuples (i.e. n'=0).

		maximum gain = c * ( -log2(p/(p+n)) )

	If the maximum gain of the Q relation with a new variable is less
	than the best gain achieved by a literal considered so far, then
	any Q literals obtained by replacing a new variable with an 
	existing variable can be eliminated from the search space.

The process of adding a literal to the right-hand side of a clause is
terminated if the local training set covered by the clause
contains no negative tuples or stopping criteria [2] is satisfied. We 
defer the discussion on stopping criteria to the next section.

The last process in procedure GROW A CLAUSE is pruning. Literals in the
right-hand side of the clause are grown from left to right and pruning
begins one at a time in the reverse direction. The pruned clause must 
cover all the positive tuples covered by the original clause but must 
not cover additional negative tuples. The clause obtained as a result 
of this pruning process is one of the final clauses of the Horn clause 
definition of the target relation.


2.2 FIND A RELATION

Let us return to the outer level procedure named FIND A RELATION. We will 
once again defer the discussion on stopping criteria to the next section.
Each iteration in the FIND A RELATION procedure finds a clause (using 
GROW A CLAUSE procedure) that covers a subset of positive tuples of the 
target relation. The procedure removes this subset from 
the training set before the next iteration. The process is terminated 
when there are no more positive tuples in the training set or stopping
criteria [1] is satisfied. The target relation is represented as a
Horn clause definition of all clauses produced.

Clauses in the Horn clause definition are pruned, one at a time, 
in the same order in which they were produced.

The final clauses are re-ordered such that non-recursive clauses
precede recursive ones.

If the optional test examples are available, the final clauses
are tested using these examples and the test accuracy is printed.


2.3 Binary Symmetric Relation Check

All binary input relations are checked at the beginning of the 
algorithm to see if they are symmetrical. The system avoids testing
the value of the redundant variable ordering if they are found to be
symmetrical. For example, the information-gain and coding cost
(described in the next section) of reverse(A,B) and reverse(B,A)
have exactly the same values. Thus, evaluation can be eliminated for
one of the two.

The equality relation gives scope for more redundancy to be eliminated
in the consideration of subsequent literals in the clause in that if
two variables are known to be equal, it is unnecessary to consider
literals involving one of the variables, if those involving the other
are considered. For example, if A = M during the development of a clause,
it is redundant to subsequently consider both append(D,E,A) and 
append(D,E,M).



3.0 Stopping Criteria 

FOIL is intended for use in situations for which there may not be a
perfect definition of the target relations.  It uses encoding-length
heuristics to restrict definitions to what can be justified by the data
available.  It is also possible for clauses to be inexact, i.e. to cover
negative tuples.  FOIL prints a warning message if the clauses it finds
are not a perfect definition of the relation. 

The heuristic is applied on the principle that, for a sensible clause, the
number of bits required to encode the clause should never exceed the number
of bits needed to explicitly indicate the positive tuples covered by the 
clause.

The number of bits required to indicate explicitly p positive tuples out 
of |T| number of training tuples is given by

	EC = log2(|T|) + log2(|T| choose p)

B(Li), the bits required to encode a literal are

			1		    (to indicate whether negated)
	     +log2(number of relations)     (to indicate which relation)
	+log2(number of possible arguments) (to indicate which variables)

The bits required to encode a clause of n literals are given by:

	BC = sum(B(Li)) - log2(n!)

The last term is due to the fact that all ordering of literals are
equivalent.

Let us now return to the stopping criteria mentioned in section 2.

Stopping criterion [1] corresponds to: "(no clause can be added to the
Horn clause definition)". If this stopping condition is satisfied,
not all positive tuples are covered by the Horn clause definition; 
thus, it is an incomplete set of clauses.

Stopping criterion [2] corresponds to: "(no literal can be added to the 
clause)". When this stopping criterion is satisfied, the clause is tested
whether it achieves accuracy >= 85%. If it does, GROW A CLAUSE procedure
returns this clause as the resultant clause. Note that the resultant
clause could be an inexact clause, as it covers some negative tuples,
when the clause accuracy is less than 100%. A backup facility is 
incorporated in FOIL to recover the search when it fails to produce
a clause. This is described in the following section.

Everytime before a literal is added to a clause, EC and BC are
calculated. If EC is found to be more than BC, then this literal
is not added to the clause, according to the encoding-length heuristics.
No literal can be added to the clause if no possible literal can
satisfy the encoding-length heuristics.

If no clause can be produced from a call to GROW A CLAUSE procedure,
either by satisfying stopping criteria [2] or Ti containing no negative
tuples, then it is said that stopping criteria [1] is satisfied.


4.0 Backup facility

FOIL.2 incorporates a primitive back-up facility.  When there are several
possible next literals that have approximately equal gain, the system
will set a checkpoint.  If the current attempt to find a useful clause
fails and there is a remaining checkpoint, the system will recover to that
point and go on with the alternative literal.  The checkpoints are ranked
so recovery is not necessarily to the most recent checkpoint.  There are
limits on the maximum number of alternatives to any one literal and on
the maximum number of saved checkpoints.


5.0 Determinate literal

An addition to FOIL.2 was the automatic inclusion of "determinate" 
literals.  A literal L is determinate if it introduces at least one new 
variable and, for each positive tuple in the current training set, there 
is exactly one binding of these new variables that satisfies the literal.  
All determinate literals are automatically added to a clause unless 
there is a literal with very high gain (default: 80% of the 
maximum possible gain).  Since there is exactly one binding in 
each case, the inclusion of determinate literals does not 
increase the size of the training set, and unnecessary literals are
removed during literal pruning stage.  The idea for determinate literals
came from "determinate terms" in Stephen Muggleton's GOLEM system, and
their effect is to give FOIL a kind of look-ahead; using them, FOIL has
been able to find some very complex definitions (including quicksort).

As a result of the inclusion of determinate literals, two improvements
are added to the current capabilities.


5.1 Quick pruning of redundant determinate literals

Because all determinate literals are added automatically to a clause
when there is no literal with high gain, many of the determinate
literals are redundant, with their variables never used in the 
literals introduced on the basis of gain. An improvement has been 
added to prune these "floating" determinate literals as a group, 
rather than one at a time.


5.2 Clause Bootstrapping

It was observed that in some tasks, the system was repeatedly using
the same determinate literals to start each clause of a definition.
The nature of determinate literals is such that if true for one
clause, they will be true (though not necessarily useful) for the
next. Consequently, an option has been added to FOIL to enable the
system to bootstrap a clause with the literals from the preceding
clause (prior to pruning), up to the point where a literal was
introduced on the basis of gain.


6.0 Constant Ordering (addition for FOIL4)

FOIL restricts the introduction of recursive literals to ensure that a
definition will terminate (see section 7). This uses an order upon the
constants. The user can predefine the order for a type, or permit the
system to find one. The method used to find the order of constants in a type
is to first determine which pairs of arguments of relations imply a
possible ordering on the type if the pair of arguments are themselves ordered.
Then the largest set of such pairs giving a consistent ordering is found,
and the order of the constants chosen.


7.0 Recursive Literal Restrictions (modified for FOIL4)

During clause growing FOIL only considers the addition of a recursive literal
if there is a permutation of the LHS arguments such that this and all
previous recursive literal are less than the LHS. A literal is less than
another, if (for the permuted arguments), the first argument of the literal
is less than that of the LHS, or the first is equal and the second less than,
or the first two equal and the third less than etc. An argument is less than
that of the LHS if in current tuples the constant corresponding to the argument
is less than that for the corresponding argument of the LHS. (Where "less
than" has been used in the preceding description, "greater than" can also
be applicable where the system has found the constant ordering, so the
polarity is arbitrary).


8.0 Clause Regrowing (addition for FOIL4)

When a clause has been completed, it may be the case that the last literal(s)
contains only LHS variables, and could have started the clause, possibly
leading to a shorter definition. In cases where the literal is non-recursive,
and some of the other gainful literals in the clause use non-LHS arguments,
the clause is regrown starting from the last literal(s) of the previous
clause.


9.0 Shorter Clause Finding (addition for FOIL4)

During clause growing a shorter clause covering (at least) as many positive
tuples as the final clause (and no negative ones) may be found. In such a 
case the shorter clause is substituted prior to pruning, (after regrowing
if applicable).


10.0 Continuous Variables - Thresholding and Inequality (addition for FOIL5)

FOIL can now handle continuous variables and has an intrinsic threshold 
relation (Variable>Constant) and an intrinsic comparison relation 
(Variable>Variable). The latter is only applied to variables of the same 
type. The highest gain threshold value for a variable is determined by a
procedure similar to C4.5's, sorting the tuples on the variable then passing
thru' the training set once. Continuous values are not permitted to be
matched, so while the continuous value 100.0 can be found by matching the 
discrete value "water" in a boiling point relation, "water" cannot be found
by matching on 100.0. (For coding purposes the threshold relation is treated
as though it is a binary relation with the first argument the variable and
the second the threshold chosen from amongst the possible thresholds for
that variable - use of the uniform coding option is recommended).

11.0 Missing Values (addition for FOIL5)

FOIL can now handle problems with missing values, using a very simple approach
- a test involving a missing value always fails.

12.0 Gain Adjustment for Sampled Tuples (modification for FOIL5)

If FOIL generates the negative tuples and the -p option is used so that only
a fraction are produced, the information gain is adjusted to reflect the fact
that the sample negative tuples are only a fraction of the total.

13.0 Uniform Coding Option (addition for FOIL5)

With the default coding the cost of a literal at a given point in the clause
depends upon which relation it is. This option charges uniformly for all
literals.


References

        Quinlan, J.R. (1990), "Learning Logical Definitions from Relations",
        Machine Learning 5, 239-266.

        Quinlan, J.R. (1991), "Determinate Literals in Inductive Logic
        Programming", Proceedings 12th International Joint Conference on
        Artificial Intelligence, 746-750, Morgan Kaufmann.

        Quinlan, J.R. and Cameron-Jones, R.M. (1993), "FOIL: a midterm report",
        Proceedings European Conference on Machine Learning, Springer Verlag,
        (forthcoming).

        Cameron-Jones, R.M. and Quinlan, J.R. (1993), "Avoiding Pitfalls When
        Learning Recursive Theories", IJCAI 93, (forthcoming).

        Unpublished:
        Cameron-Jones, R.M. and Quinlan, J.R., "First Order Learning,
        Zeroth Order Data"
