Varying the Order of the N-gram

**Figure 4:** Plot of perplexity against order of n-gram
$\begin{figure} \centering\leavevmode \epsfxsize = 7 cm \epsffile{border2.eps}\end{figure}$

**Figure 5:** Plot of junctures-correct (upper line) and breaks-correct (lower line) against order of n-gram
$\begin{figure} \centering\leavevmode \epsfxsize = 7 cm \epsffile{border6.eps}\end{figure}$

Perplexity is a measure of average branching factor and can be used to measure how well an n-gram predicts the next juncture type in the test set. If N is the order of the n-gram and Q is the number of junctures in the test set, the perplexity B can be calculated from the entropy H by:

$\begin{displaymath} H=-\frac{1}{Q} \sum^{Q}_{i=1}log P(j_{i} \vert j_{i-1}, j_{i-2},....,j{i-N+1}) \end{displaymath}$

(9)

N-grams can be estimated from simple frequency counts of the data. Figure 4 shows how perplexity of a phrase break model of juncture types break and non-break varies as a function of n-gram order. The differences between the various phrase-break models are not large, but it can be seen that the 6-gram has a perplexity of 1.54 compared to the unigram case of about 1.62. It is common in language modelling to use smoothing to account for rare and unseen cases. We recalculated our phrase-break model parameters using 3 types of smoothing: a fixed floor for unseen cases; Good-Turing smoothing, which alters the probabilities of rarely seen cases as well as unseen cases; and back-off smoothing whereby the values for rare n-grams are computed from the n-1-grams equivalents. None of the types of smoothing had a significant effect on the perplexity or indeed the overall results. In practice we use the simplest type of smoothing where unseen n-grams are given a frequency count of 1 during training.

Table 5: Comparison of different order phrase break models

N-gram order	Breaks-correct	Junctures-correct	Juncture-insertions
1	69.94	91.56	3.60
2	78.78	91.32	5.86
3	78.56	90.67	6.62
4	77.49	91.24	5.67
5	77.07	91.39	5.40
6	77.07	91.49	5.27
7	76.99	91.65	5.07
8	76.78	91.54	5.14

Figure 5 and table 5 show how the order of the n-gram affects overall performance. The most noticeable effect is the big increase in performance between the unigram phrase-break model and the rest, which are fairly similar. This result is due to the phrase-break model assigning a very low probability (0.03) for a break given a preceding break compared with a probability of 0.2 for the same sequence from the unigram. The higher order n-grams perform slightly better than the bigram in terms of junctures-correct and juncture-insertions. In table 5 the 7-gram performs the best. In most of our experiments n-grams of order 6 and 7 had consistently better results than the other n-grams, but the difference was often slight. Figure 4 also shows that the perplexity of n-grams of about this order is slightly lower than for the others.