Motivation

Latent variable graphical models such as hidden Markov models and topic models have become a popular framework for modeling complex dependencies among variables. Although, latent variables can present a compact parameterization of an otherwise complex model, learning in these models can often be challenging. Maximum likelihood learning of the parameter of latent variable models results in a nonconvex objective which is often optimized with local search heuristics like Expectation Maximization. EM often suffers from local minima and slow convergence.

In many of these cases, approaching the problem by exploiting the underlying spectral properties provides a fundamentally different perspective. For example, consider learning Hidden Markov Models. While the problem is NP-hard under cryptographic assumptions, recent results have given algorithms with polynomial computational and sample complexity given modest assumptions on the rank and singular values of the transition/observation matrices. Interestingly, the sample complexity is a function of not only the topology of the model, but also the underlying spectral properties of the parameters. This is fundamentally different from learning in observed models where learning primarily depends on the topology.

Inspired by these results, the past two years have seen the rise of many fast, provably consistent spectral methods for latent variable model learning including algorithms for predictive state representations, finite state transducers, latent trees, latent junction trees, probabilistic context free grammars, and mixture/admixture models. While some of these methods aim to recover the original parameters, most propose an alternative parameterization, called the observable representation, that still allows for efficient inference of observed marginals (but not latent ones), and is useful for many predictive tasks.

Due to their speed and ability to handle complex features, spectral algorithms have proven useful for dynamical systems and natural language processing tasks such as parsing.

Moreover, when combined with kernel methods, such as Hilbert Space Embeddings, many spectral algorithms can be elegantly generalized to domains where kernels can be defined, such as continuous, non Gaussian data where it is difficult to run EM.

Despite these advances, the area is relatively unexplored and many challenges remain. Some examples include (but are not limited to): (1) extending these algorithms to more general latent models, (2) theoretical analysis when the true model deviates from the assumed one, (3) applying similar principles to other graphical models problems beyond learning (4) new applications such as computer vision.

References

  1. A. Anandkumar, K. Chaudhuri, D. Hsu, S.M. Kakade, L. Song, and T. Zhang. Spectral methods for learning multivariate latent tree structure. In Advances of Neural Information Processing Systems (NIPS), 2011.
  2. A. Anandkumar, D.P. Foster, D. Hsu, S.M. Kakade, and Y.K. Liu. Two svds suffice: Spectral decompositions for probabilistic topic modeling and latent dirichlet allocation. Arxiv preprint arXiv:1204.6703, 2012.
  3. A. Anandkumar, D. Hsu, and S.M. Kakade. A method of moments for mixture models and hidden markov models. In Proceedings of the Annual Conference on Computational Learning Theory (COLT), 2012.
  4. R. Bailly, F. Denis, L. Ralaivola. Grammatical inference as a principal component analysis problem. ICML 2009
  5. R. Bailly, A. Habrard, F. Denis. A Spectral Approach for Probabilistic Grammatical Inference on Trees. ALT 2010
  6. R. Bailly. QWA: Spectral Algorithm. Journal of Machine Learning Research - Proceedings Track 20: 147-163 (2011)
  7. B. Balle, A. Quattoni, and X. Carreras. A spectral learning algorithm for finite state transducers. Machine Learning and Knowledge Discovery in Databases, pages 156–171, 2011.
  8. B. Boots, S. Siddiqi, and G. Gordon. Closing the learning-planning loop with predictive state representations. In Proceedings of Robotics: Science and Systems, Zaragoza, Spain, June 2010.
  9. S.B. Cohen, K. Stratos, M. Collins, D.P. Foster, and L. Ungar. Spectral learning of latent variable pcfgs. In Association of Computational Linguistics (ACL), volume 50, 2012.
  10. P. Dhillon, J. Rodu, D. Foster and L. Ungar. Two Step CCA: A new spectral method for estimating vector models of words ICML 2012
  11. D. Hsu, S. Kakade, and T. Zhang. A spectral algorithm for learning hidden Markov models. In Proceedings of the Annual Conference on Computational Learning Theory (COLT), 2009.
  12. F.M. Luque, A. Quattoni, B. Balle, and X. Carreras. Spectral learning in non-deterministic dependency parsing. Proceedings of the 13th Conference of the European Chapter of the Association for computational Linguistics (EACL 2012).
  13. E. Mossel and S. Roch. Learning nonsingular phylogenies and hidden Markov models. Annals of Applied Probability,16(2):583–614, 2006.
  14. A.P. Parikh, L. Song, and E.P. Xing. A spectral algorithm for latent tree graphical models.In Proceedings of the 28th International Conference on Machine Learning (ICML 2011).
  15. A.P. Parikh, L. Song, M. Ishteva, G. Teodoru, and E.P. Xing. A spectral algorithm for latent junction trees. Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI 2012).
  16. A. Smola, A. Gretton, L. Song, and B. Scholkopf. A hilbert space embedding for distributions. In Algorithmic Learning Theory, pages 13–31. Springer, 2007.
  17. L. Song, B. Boots, S. Siddiqi, G. Gordon, and A. Smola. Hilbert space embeddings of hidden Markov models. In Proceedings of the 27th International Conference on Machine Learning, (ICML 2010)
  18. L. Song, A.P. Parikh, and E.P. Xing. Kernel embeddings of latent tree graphical models. In Advances in Neural Information Processing Systems (NIPS), 2011.
  19. S. Terwijn. On the learnability of hidden markov models. Grammatical Inference: Algorithms and Applications, pages 344–348, 2002.