Fast Inference and Learning in Large-State-Space HMMs

Abstract

For Hidden Markov Models (HMMs) with fully connected transition models, the three fundamental problems of evaluating the likelihood of an observation sequence, estimating an optimal state sequence for the observations, and learning the model parameters, all have quadratic time complexity in the number of states. We introduce a novel class of non-sparse Markov transition matrices called Dense-Mostly-Constant (DMC) transition matrices that allow us to derive new algorithms for solving the basic HMM problems in sub-quadratic time. We describe the DMC HMM model and algorithms and attempt to convey some intuition for their usage. Empirical results for these algorithms show dramatic speedups for all three problems. In terms of accuracy, the DMC model yields strong results and outperforms the baseline algorithms even in domains known to violate the DMC assumption. Most statistical approaches to modeling text implicitly assume that informative words are rare. This assumption is usually appropriate for topical retrieval and classification tasks; however, in non- topical classification and soft-clustering problems where classes and latent variables relate to sentiment or author, informative words can be frequent. In this paper we present a comprehensive set of statistical learning tools which treat words with higher frequencies of occurrence in a sensible manner. We introduce probabilistic models of contagion for classification and soft-clustering based on the Poisson and Negative-Binomial distributions, which share with the Multinomial the desirable properties of simplicity and analytic tractability. We then introduce the Delta-Square statistic to select features and avoid over-fitting.

Back to the Main Page

Pradeep Ravikumar

Last modified: Thu Nov 10 11:41:07 EST 2005