Introduction



next up previous
Next: The organization of Up: Parsing English with a Previous: Parsing English with a

Introduction

 

Most sentences of most natural languages have the property that if arcs are drawn connecting each pair of words that relate to each other, then the arcs will not cross [][p. 36]Melchuk. This well-known phenomenon, which we call planarity, is the basis of link grammars our new formal language system.

A link grammar consists of a set of words (the terminal symbols of the grammar), each of which has a linking requirement. A sequence of words is a sentence of the language defined by the grammar if there exists a way to draw links among the words so as to satisfy the following conditions:

=Satisfaction:x

The linking requirements of each word are contained in a dictionary. To illustrate the linking requirements, the following diagram shows a simple dictionary for the words a, the, cat, snake, Mary, ran, and chased. The linking requirement of each word is represented by the diagram above the word.

Each of the intricately shaped labeled boxes is a connector. A connector is satisfied by matching it to a compatible connector (one with the appropriate shape, facing in the opposite direction). Exactly one of the connectors attached to a given black dot must be satisfied (the others, if any, must not be used). Thus, cat requires a D connector to its left, and either an O connector to its left or a S connector to its right. Plugging a pair of connectors together corresponds to drawing a link between that pair of words.

The following diagram shows how the linking requirements are satisfied in the sentence The cat chased a snake.

./intro-sent-1-nostray-curve.ps

(The unused connectors have been suppressed here.) It is easy to see that Mary chased the cat, and the cat ran are also sentences of this grammar. The sequence of words: the Mary chased cat is not in this language. Any attempt to satisfy the linking requirements leads to a violation of one of the three rules. Here is one attempt:

./intro-sent-2-curve.ps

Similarly ran Mary, and cat ran chased are not part of this language.

A set of links that prove that a sequence of words is in the language of a link grammar is called a linkage. From now on we'll use simpler diagrams to illustrate linkages. Here is the simplified form of the diagram showing that the cat chased a snake is part of this language:

to 20bp

We have a succinct, computer-readable notation for expressing the dictionary of linking requirements. The following dictionary encodes the linking requirements of the previous example.

The linking requirement for each word is expressed as a formula involving the operators =13&, and =13or, parentheses, and connector names. The + or - suffix on a connector name indicates the direction (relative to the word being defined) in which the matching connector (if any) must lie. The =13& of two formulas is satisfied by satisfying both the formulas. The =13or of two formulas requires that exactly one of its formulas be satisfied. The order of the arguments of an =13& operator is significant. The farther left a connector is in the expression, the nearer the word to which it connects must be. Thus, when using cat as an object, its determiner (to which it is connected with its D- connector) must be closer than the verb (to which it is connected with its O- connector).

We can roughly divide our work on link grammars into three parts: the link grammar formalism and its properties, the construction of a wide-coverage link grammar for English, and efficient algorithms and techniques for parsing link grammars. We now touch briefly on all three of these aspects.

Link grammars are a new and elegant context-free grammatical formalismgifgif, and have a unique combination of useful properties:

=1.

Our second result is the construction of a link grammar dictionary for English. The goal we set for ourselves was to make a link grammar that can distinguish, as accurately as possible, syntactically correct English sentences from incorrect ones. We chose a formal or newspaper-style English as our model. The result is a link grammar of roughly eight hundred definitions (formulas) and 25000 words that captures many phenomena of English grammar. It handles: noun-verb agreement, questions, imperatives, complex and irregular verbs, many types of nouns, past- or present-participles in noun phrases, commas, a variety of adjective types, prepositions, adverbs, relative clauses, possessives, coordinating conjunctions, unbounded dependencies, and many other things.

The third result described in this paper is a program for parsing with link grammars. The program reads in a dictionary (in a form very similar to the tables above) and will parse sentences according to the given grammar. The program does an exhaustive search - it finds every way of parsing the given sequence with the given link grammar. It is based on our own algorithm ( is the number of words in the sentence to be parsed). The program also makes use of several very effective data structures and heuristics to speed up parsing. The program is comfortably fast - parsing typical newspaper sentences in a few seconds on a modern workstation.

Both our program (written in ANSI-C) and our dictionary are available via anonymous ftp through the internet.gif Having the program available for experimentation may make it easier to understand this paper.





next up previous
Next: The organization of Up: Parsing English with a Previous: Parsing English with a




Thu Oct 12 13:01:13 EDT 1995