Minor and Major

Next: Using Distance Information: the Up: Part-of-Speech Sequence Models Previous: Varying POS sequence length

Minor and Major

Table 4: Minor and major phrase breaks

Phrase break Model	Minor correct	Major correct	Junctures-correct	Junctures-insertions
1-gram	60.125	56.415	90.128	2.035
6-gram	68.703	59.502	90.211	3.923

Most of our work has centred around a model comprising two types of juncture, namely ``break'' and ``non-break'', but the algorithm can easily be applied to an arbitrary number of juncture types. The MARSEC corpus is in fact labelled with two types of break, ``major'' and ``minor'' and so it was straightforward to build a system that predicted these types. The method is as before, but now three POS sequence models are required and the phrase break model has a vocabulary of 3 (implying higher perplexities and more sparse data problems). Note that when more than two types of break are used, substitution errors need to be counted for when major breaks are placed where minor breaks occur and vice versa. Table 4 reports the results.

We have not yet performed as comprehensive an analysis on multiple break types as with the single type, but have noted that the best tagsets for the each case are different. While the V₂₃ tagset was the best for the single break case, larger tagsets gave better results for the multiple case, despite the more severe sparse data problems that occur. This indicates that finer distinctions are needed for discrimination of major and minor types.

Next: Using Distance Information: the Up: Part-of-Speech Sequence Models Previous: Varying POS sequence length

Alan W Black
1999-03-20