"Maximum Entropy Markov Models
for Information Extraction and Segmentation"
Hidden Markov models (HMMs) are a powerful probabilistic tool for
modeling sequential data, and have been applied with success to many
text-related tasks, such as part-of-speech tagging, text segmentation
and information extraction. In these cases, the observations are
usually modeled as multinomial distributions over a discrete
vocabulary, and the HMM parameters are set to maximize the likelihood
of the observations. This talk will present a new Markovian sequence
model, closely related to HMMs, that allows observations to be
represented as arbitrary overlapping features (such as word,
capitalization, formatting, part-of-speech), and defines the
conditional probability of state sequences given observation
sequences. It does this by using the maximum entropy framework to fit
a set of exponential models that represent the probability of a state
given an observation and the previous state. I'll present positive
experimental results on the segmentation of FAQ's.
Joint work with Dayne Frietag and Fernando Pereira. Thanks also to
Kamal Nigam and John Lafferty.