Future Improvements

Next: Bibliography Up: Discussion Previous: Comparison with other systems

Future Improvements

In this paper we have presented a framework for phrase break assignment from POS information and have attempted a thorough investigation into what the optimal parameter settings in the framework should be. One area we feel still needs further investigation is that of tagset composition. The experiments in section 5.2 used a greedy algorithm to collapse categories in the original 37 tagset to form a series of smaller tagsets. While this is a sensible way to progress we feel that a more sophisticated technique could do better. It may be possible to choose a tagset based on its actual ability to discriminate juncture types. We have also considered a system where two parallel sets of tags are used, one for before the juncture and the other for after.

While we believe more investigation into tagset composition would help in reducing errors we also believe that there is only so far that superficial text analysis techniques like this can go. From examining the errors in the test set we believe that a more sophisticated analysis will be needed to correct some of the errors. Although we argued in the introduction against using syntactic parsers for phrase break assignment, our reasons stem from the basic inaccuracy of these parsers, not because syntactic parses themselves are unhelpful. Recently several stochastic parsers have been presented which are trained on hand parsed trees and employ statistical techniques during parsing, e.g. [Magerman, 1994]. These have been shown to significantly outperform rule-driven parsers. It is possible that a statistical parser could provide reliable parses and hence facilitate phrase break assignment.

Next: Bibliography Up: Discussion Previous: Comparison with other systems

Alan W Black
1999-03-20