Our algorithm for parsing link grammars is based on dynamic programming. Perhaps its closest relative in the standard literature is the dynamic programming algorithm for finding an optimal triangulation of a convex polygon [][p. 320]CLR. It tries to build up a linkage (which we'll call a solution in this section) in a top down fashion: It will never add a link (to a partial solution) that is above a link already in the partial solution.
The algorithm is most easily explained by specifying a data structure
for representing disjuncts. A disjunct
has pointers to two linked
lists of connectors. These pointers are denoted
and
. If
is a connector, then
will
denote the next connector after
in its list. The next field of
the last pointer of a list has the value
.
For example, suppose the disjunct
=13((D,O) ()) (using the
notation of section 2). Then
would
point to the connector O,
would point
to the connector D, and
would be
. Similarly,
.
To give some intuition of how the algorithm works, consider the
situation after a link has been proposed between a connector
on
word
and a connector
on word
. (The words of the sequence
to be parsed are numbered from
to
.) For convenience, we
define
and
to be
and
respectively. The situation is shown in the following diagram:
Here the square boxes above the words
and
represent a data
structure node corresponding to the word. The rectangular box above
each of these represents one of the (possibly many) disjuncts for the
word. The small squares pointed to by the disjuncts represent
connectors.
How do we go about extending the partial solution into the region
strictly between
and
? (This region will be denoted
.) First of all, if there are no words in this region
(i.e.
) then the partial solution we've built is
certainly invalid if either
or
.
If
then this region is ok, and we may proceed to
construct the rest of the solution.
Now suppose that the region between
and
contains at least one
word. In order to attach the words of this region to the rest of the
sentence there must be at least one link either from
to some word
in this region, or from
to some word in this region (since no word
in this region can link to a word outside of the
range,
and something must connect these words to the rest of the sentence).
Since the connector
has already been used in the solution being
constructed, this solution must use the rest of the connectors of the
disjunct in which
resides. The same holds for
. The only
connectors of these disjuncts that can be involved in the
region are those in the lists beginning with
and
. (The
use of any other connector on these disjuncts in this region would
violate the ordering requirement.) In fact, all of the connectors of
these lists must be used in this region in order to have a
satisfactory solution.
Suppose, for the moment, that
is not
. We know that
this connector must link to some disjunct on some word in the region
. (It can't link to
because of the exclusion
rule.) The algorithm tries all possible such words and disjuncts.
Suppose it finds a word
and a disjunct
on
such that the
connector
matches
. We can now add this link to our
partial solution.
The situation is shown in the following diagram.
./alg-2small.ps
How do we determine if this partial solution can be extended to a full
solution? We do this by solving two problems similar to the problem
we started with. In particular, we ask if the solution can be
extended to the word range
using the connector lists
beginning with
and
. We also
ask if the solution can be extended to the word range
using the connector lists beginning with
and
.
Notice that in the latter case, the problem we are solving seems
superficially different: the boundary words have not already been
connected together by a link. This difference is actually of no
consequence because the pair of links (
to
and
to
) play
the role that a direct link from
to
would play: (1) they
separate the region
from all the other words, and
(2) they serve to connect the words
and
together.
We need to consider one other possibility. That is that there might
be a solution with a link between words
and
and a link
between words
and
. (This results in a solution where the
word/link graph is cyclic.) The algorithm handles this possibility by
also attempting to form a link between
and
. If
these two match, it does a third recursive call, solving a third
problem analogous to our original problem. In this problem the word
range is
and the connector lists to be satisfied begin
with
and
.
A very similar analysis suffices to handle the case when
is
.
The algorithm described has an exponential worst-case running time as
a function of
, the number of words in the sequence to be parsed.
This can easily be transformed into an efficient dynamic programming
algorithm by using memoization ([p. 312]CLR).
The running time is now bounded by the number of different possible
recursive calls multiplied by the time used by each call. A recursive
call is completely determined by specifying the pointers
and
.
(These uniquely determine
and
.) The cost of a given call is
bounded by the total number of disjuncts in the sequence of words.
If we let
be the number of disjuncts and
be the number of
connectors, then the running time is
. For a fixed link
grammar,
and
, so the running time is
.
Our technical reports describe this algorithm in more detail. They contain pseudo-code for the algorithm [14], an argument for it's correctness [14] and an elegant recurrence for the number of linkages of a sentence [11].
After the algorithm above was implemented, we were interested in seeing how well it would work on sentences taken from newspapers and other natural sources. It quickly became clear that something else was needed to make the algorithm run faster on long sentences.