==================================================== The Missing Link - Probabilistic Models of Hypertext ==================================================== - David Cohn I will discuss several bits of recent work in statistical document modelling, reported at this year's ICML, SIGIR and NIPS conferences. I'll first describe PHITS, a probabilistic bibliometric model for identifying authoritative documents from a corpus, and then show how it can be combined with Hofmann's PLSA to form a joint probabilistic model of hypertext documents. The model has many applications, including intelligent web crawling and dynamic hypertext generation. If any time remains, I'll also describe an analogous probabilistic model for the "Ask Jeeves" problem: finding answers in a matched question-answer database. Joint work with Thomas Hofmann, Huan Chang, Adam Berger, Rich Caruana, Dayne Freitag, Vibhu Mittal and Andrew McCallum.