Grammatical Trigrams

A New Approach to Statistical Language Modeling


The most widely used statistical model of language is the so-called trigram model. In this simple model, a word is predicted based solely upon the two words which immediately precede it. The trigram model has been the workhorse of the statistical approach to many language processing problems, most notably speech recognition, for over twenty years.

The simplicity of the trigram model is simultaneously its greatest strength and weakness. Its strength comes from the fact that one can easily estimate trigram statistics by counting over hundreds of millions of words of data. Since implementation of the model involves only table lookup, it is computationally efficient, and can be used in real-time systems. Yet the model captures the relations between words by the sheer force of numbers. It ignores the rich syntactic and semantic structure which constrains natural languages, allowing them to be easily processed and understood by humans. Further advances in speech recognition, and ultimately, understanding, will hinge on computational methods for predicting and analyzing natural language data that go far beyond the simple methods which are used in current systems.

The researchers propose a new approach to modeling language which preserves the strengths and computational advantages of trigrams, while it incorporates long-range dependencies and more complex information into a statistical model. The approach is based upon the ideas of probabilistic link grammar, which was recently developed by Sleator and Lafferty. The techniques of probabilistic link grammar promise to significantly improve upon the predictive power of the trigram model, since they naturally incorporate trigrams into a unified framework for modeling long-distance grammatical dependencies. Moreover, the methods are computationally efficient, which will allow them to be used in actual natural language systems on today's computers.

The results of the approach will be significant in three different ways. First, the work promises to allow the construction of language models that have greater predictive power, as measured by the statistical quantity known as entropy, than those constructed by current methods. This is important since lower entropy can translate directly into improved performance for such applications as speech recognition. Secondly, it is expected that the research will establish new results and deepen understanding in the technical foundations of this area of computer science. In particular, the research will develop new algorithms for integrating finite-state automata with more powerful machines, as well as new statistical estimation algorithms. And finally, the methods will be incorporated into speech recognition, translation, and understanding systems at both Carnegie Mellon and IBM. This will allow the success of the approach to be quantitatively measured, not only in terms of entropy, but in terms of the direct improvement it brings to those applications.