The POS sequence model is trained by searching the training data for
each juncture type and counting the number of distinct sequences of
POS tags before and after the juncture. Generally, the POS sequence
is a window of *L* tags around a juncture *j*_{i}, *M* tags preceding
*j*_{i} and *L*-*M* tags following *j*_{i}. In our standard system there
are 2 tags before and 1 after the juncture (*L*=3, *M*=2). These counts
are converted into probabilities by dividing each count by the total
number of occurrences of that juncture type in the data. This gives an
estimate of the probability of a POS sequence given a juncture type.

Let us denote a POS sequence
*c*_{i-M},..,*c*_{i},..,*c*_{i+L-M} as
*C* and the number of times this occurs in the training set as
*count*(*C*). The number of times a juncture type occurs is given by
*count*(*j*). Thus an estimation of the probability is given by:

which in expanded form is: