The POS sequence model is trained by searching the training data for each juncture type and counting the number of distinct sequences of POS tags before and after the juncture. Generally, the POS sequence is a window of L tags around a juncture ji, M tags preceding ji and L-M tags following ji. In our standard system there are 2 tags before and 1 after the juncture (L=3, M=2). These counts are converted into probabilities by dividing each count by the total number of occurrences of that juncture type in the data. This gives an estimate of the probability of a POS sequence given a juncture type.
Let us denote a POS sequence ci-M,..,ci,..,ci+L-M as C and the number of times this occurs in the training set as count(C). The number of times a juncture type occurs is given by count(j). Thus an estimation of the probability is given by:
which in expanded form is: