Equation 2 shows the general POS sequence formula which is
expressed in terms of a window of *L* tags with *M* of these tags
before the juncture and *L*-*M* tags after. We can expect longer
sequences to be potentially more discriminative, but more prone to
sparse data problems. Table 3 shows results from experiments
which varied *L* and *M*. These were performed on the 23 POS tagset,
using smoothing and a 1-gram and 6-gram phrase break model. For both phrase
model conditions the *L* = 3, *M*=2 condition outperforms the others.