Wang and McCallum, KDD 2006

From ScribbleWiki: Analysis of Social Media

Jump to: navigation, search

Topics over Time: A Non-Markov Continuous-Time Model of Topical Trends

The paper presents an LDA-style topic model that explicitly models time jointly with word co-occurrence. The idea of modeling time explicitly is interesting because many data sets do not have static co-occurrence patterns rather they are dynamic. The meaning of a topic remains fixed and reliable, but its prevalence over time is captured, and topics may thus focus in on co-occurrence patterns that are time-sensitive. For eg., the keywords: 'United', 'States', 'war' may relate to Mexican-American war in 1846, World War I in 1918 and Iraq war in 2006.

Unlike other work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps. Being a generative Bayesian network model, it offers more information than the discriminative models. Along with predicting the topic(s) of a document, we may predict the distribution of the topics over time as well as predict timestamp based on the the document. Compared to LDA, it shows qualitative improvements in the topic saliency and the ability to predict time given words.

Experiments have been performed on three datasets: nine months of personal email, 17 years of NIPS research papers and over 200 years of presidential state-of-the-union addresses. They have shown improved topics, better timestamp prediction and interpretable trends of topics.

Personal tools
  • Log in / create account