The phrase break model is trained by examining the database again, but this time ignoring the POS information and only examining junctures. A n-gram of order N is constructed which represents the probability of different sequences of junctures. Using J^{N}_{i-1} to represent the the previous sequence of N junctures, we have:
P(j_{i} | J^{N}_{i-1}) = P(j_{i} | j_{i-1}, j_{i-2}, j_{i-3},...,j_{i-N+1}) | (3) |