Modeling Science

David M. Blei and John D. Lafferty
Princeton Unversity and Carnegie Mellon University

[Click here to go right to the browser]

This is a browsable 100-topic model estimated from the Journal Science . Each article was OCRed by JSTOR, who graciously supplied the data. Details of the algorithm that was used to create this browser can be found in "A Correlated Topic Model of Science" (Annals of Applied Statistics, 2007).

Topic pages

A list of estimated topics can be found here. Each topic is represented by the top five words from its distribution. Clicking on a topic leads to a page that contains the top 100 words from that topic, the significant connections to other topics, and the articles that exhibit that topic with the highest proportion. (Move to the center of the list of articles to find those that are heterogenous.)

Document pages

Clicking on a title leads to a page which lists the main topics that were combined to form that article and other similar articles in terms of expected hellinger distance between the topic proportions. For those users who have access to JSTOR, the title of the article at the top of the page is a link to the original scanned document. (Occasionally, there are OCR errors beacuse of paging and the assigned topics do not make sense. Going to the original scanned article generally identifies this situation.)