next up previous
Next: Features and constraints Up: Maxent Modeling Previous: Maxent Modeling

Training data

To study the process, we observe the behavior of the random process for some time, collecting a large number of samples tex2html_wrap_inline1564 . In the example we have been considering, each sample would consist of a phrase x containing the words surrounding in, together with the translation y of in which the process produced. For now we can imagine that these training samples have been generated by a human expert who was presented with a number of random phrases containing in and asked to choose a good translation for each.

We can summarize the training sample in terms of its empirical probability distribution tex2html_wrap_inline1570 , defined by


Typically, a particular pair tex2html_wrap_inline1574 will either not occur at all in the sample, or will occur at most a few times.

Adam Berger
Fri Jul 5 11:43:50 EDT 1996