Paper Abstracts

(Publications are ordered in reverse chronological order. For a breakdown by publication type, please feel free to send a request for a copy of my CV.)


Here Or There: Preference Judgments for Relevance
Ben Carterette, Paul N. Bennett, David Maxwell Chickering, Susan T. Dumais
To Appear in Proceedings of the European Conference on Information Retrieval (ECIR) 2008, Glasgow, Scotland, March-April, 2008.
The official published version of this article is available at Springer-Verlag's website and is © 2008 Springer-Verlag GmbH Berlin Heidelberg.

Information retrieval systems have traditionally been evaluated over absolute judgments of relevance: each document is judged for relevance on its own, independent of other documents that may be on topic. We hypothesize that preference judgments of the form "document A is more relevant than document B" are easier for assessors to make than absolute judgments, and provide evidence for our hypothesis through a study with assessors. We then investigate methods to evaluate search engines using preference judgments. Furthermore, we show that by using inferences and clever selection of pairs to judge, we need not compare all pairs of documents in order to apply evaluation methods.



Neighborhood-Based Local Sensitivity (pdf)
Paul N. Bennett
To Appear in Proceedings of the European Conference on Machine Learning, Warsaw, Poland, September, 2007.
The official published version of this article is available at Springer-Verlag's website and is © 2007 Springer-Verlag Berlin Heidelberg.

We introduce a nonparametric model for sensitivity estimation which relies on generating points similar to the prediction point using its k nearest neighbors. Unlike most previous work, the sampled points differ simultaneously in multiple dimensions from the prediction point in a manner dependent on the local density. Our approach is based on an intuitive idea of locality which uses the Voronoi cell around the prediction point, i.e. all points whose nearest neighbor is the prediction point. We demonstrate how an implicit density over this neighborhood can be used in order to compute relative estimates of the local sensitivity. The resulting estimates demonstrate improved performance when used in classifier combination and classifier recalibration as well as being potentially useful in active learning and a variety of other problems.



Dual Strategy Active Learning
Pinar Donmez, Jaime G. Carbonell, Paul N. Bennett
To Appear in Proceedings of the European Conference on Machine Learning, Warsaw, Poland, September, 2007.
The official published version of this article is available at Springer-Verlag's website and is © 2007 Springer-Verlag Berlin Heidelberg.

Active Learning methods rely on static strategies for sampling unlabeled point(s). These strategies range from uncertainty sampling and density estimation to multi-factor methods with learn-onceuse- always model parameters. This paper proposes a dynamic approach, called DUAL, where the strategy selection parameters are adaptively updated based on estimated future residual error reduction after each actively sampled point. The objective of dual is to outperform static strategies over a large operating range: from very few to very many labeled points. Empirical results over six datasets demonstrate that DUAL outperforms several state-of-the-art methods on most datasets.



Combining Probability-Based Rankers for Action-Item Detection (pdf)
Paul N. Bennett and Jaime G. Carbonell
In Proceedings of the Human Language Technologies — North American ACL 2007 Conference, Rochester, New York, April, 2007.

This paper studies methods that automatically detect action-items in e-mail, an important category for assisting users in identifying new tasks, tracking ongoing ones, and searching for completed ones. Since action-items consist of a short span of text, classifiers that detect action-items can be built from a document-level or a sentence-level view. Rather than commit to either view, we adapt a context-sensitive metaclassification framework to this problem to combine the rankings produced by different algorithms as well as different views. While this framework is known to work well for standard classification, its suitability for fusing rankers has not been studied. In an empirical evaluation, the resulting approach yields improved rankings that are less sensitive to training set variation, and furthermore, the theoretically-motivated reliability indicators we introduce enable the metaclassifier to now be applicable in any problem where the base classifiers are used.



Building Reliable Metaclassifiers for Text Learning    (pdf)
Paul N. Bennett
CMU-CS-06-121, Ph.D. Thesis, Computer Science Department, School of Computer Science, Carnegie Mellon University, May 2006.
See errata.

Appropriately combining information sources to form a more effective output than any of the individual sources is a broad topic that has been researched in many forms. It can be considered to contain sensor fusion, distributed data-mining, regression combination, classifier combination, and even the basic classification problem. After all, the hypothesis a classifier emits is just a specification of how the information in the basic features should be combined. This dissertation addresses one subfield of this domain: leveraging locality when combining classifiers for text classification ...



Feature Representation for Effective Action-Item Detection    (pdf)
Paul N. Bennett and Jaime Carbonell
Beyond Bag-of-Words Workshop at the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 63-70, Salvador, Brazil, August 15 - August 19, 2005. ACM Press.
This extends work presented as a SIGIR 2005 poster.

E-mail users face an ever-growing challenge in managing their inboxes due to the growing centrality of email in the workplace for task assignment, action requests, and other roles beyond information dissemination. Whereas Information Retrieval and Machine Learning techniques are gaining initial acceptance in spam filtering and automated folder assignment, this paper reports on a new task: automated action-item detection, in order to flag emails that require responses, and to highlight the specific passage(s) indicating the request(s) for action. Unlike standard topic-driven text classification, action-item detection requires inferring the sender's intent, and as such responds less well to pure bag-of-words classification. However, using enriched feature sets, such as n-grams (up to n=4) with chi-squared feature selection, and contextual cues for action-item location improve performance by up to 10% over unigrams, using in both cases state of the art classifiers such as SVMs with automated model selection via embedded cross-validation.



Detecting Action-Items in E-mail    (pdf)
Paul N. Bennett and Jaime Carbonell
Poster Paper in the Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, August 15 - August 19, 2005. ACM Press.
A more recent extension of this work appeared in the SIGIR 2005 Beyond Bag-of-Words Workshop.

E-mail users face an ever-growing challenge in just managing their inboxes, due to the growing centrality of email in the workplace for task assignment, action requests, and other roles beyond information dissemination. Whereas Information Retrieval and Machine Learning techniques are gaining initial acceptance in spam filtering and automated folder assignment, this paper reports on a new task: automated action-item detection, in order to flag emails that require responses, and to highlight the specific passage(s) indicating the request(s) for action. Unlike standard topic-driven text classification, action-item detection requires inferring the sender's intent, and as such responds less well to pure bag-of-words classification. However, using enriched feature sets, such as n-grams (up to n=4) with chi-squared feature selection, and contextual cues for action-item location improve performance by up to 10% over unigrams, using in both cases state of the art classifiers such as SVMs with automated model selection via embedded cross-validation.



The Combination of Text Classifiers Using Reliability Indicators
Paul N. Bennett, Susan T. Dumais, Eric Horvitz
Information Retrieval. Kluwer Academic Publishers.
January, 2005, 8(1):67-100.
Initial Submission: 12/19/2002; Acceptance (Pending Revision): 7/21/2003; Final Version Acceptance: 1/6/2004.
(This paper revises and extends work originally presented in [Bennett, Dumais, & Horvitz; SIGIR 2002].)

The intuition that different text classifiers behave in qualitatively different ways has long motivated attempts to build a better metaclassifier via some combination of classifiers. We introduce a probabilistic method for combining classifiers that considers the context-sensitive reliabilities of contributing classifiers. The method harnesses reliability indicators—variables that provide signals about the performance of classifiers in different situations. We provide background, present procedures for building metaclassifiers that take into consideration both reliability indicators and classifier outputs, and review a set of comparative studies undertaken to evaluate the methodology.



Inductive Transfer for Text Classification using Generalized Reliability Indicators (pdf)
Paul N. Bennett, Susan T. Dumais, Eric Horvitz
Proceedings of the ICML-2003 Workshop on The Continuum from Labeled to Unlabeled Data, Washington DC, U.S.A., August 2003.

Machine-learning researchers face the omnipresent challenge of developing predictive models that converge rapidly in accuracy with increases in the quantity of scarce labeled training data. We introduce Layered Abstraction-Based Ensemble Learning (LABEL), a method that shows promise in improving generalization performance by exploiting additional labeled data drawn from related discrimination tasks within a corpus and from other corpora. LABEL first maps the original feature space, targeted at predicting membership in a specific topic, to a new feature space aimed at modeling the reliability of an ensemble of text classifiers. The resulting abstracted representation is invariant across each of the binary discrimination tasks, allowing the data to be pooled. We then construct a context-sensitive combination rule for each task using the pooled data. Thus, we are able to more accurately model domain structure which would not have been possible using only the limited labeled data from each task separately. Using several corpora for an empirical evaluation of topic classification accuracy of text documents, we demonstrate that LABEL can increase the generalization performance across a set of related tasks.



Using Asymmetric Distributions to Improve Text Classifier Probability Estimates    (pdf)
Paul N. Bennett
Proceedings of 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada, July 28 - August 1, 2003. ACM Press.
An earlier version is available as CMU-CS-02-126.
See errata.

Text classifiers that give probability estimates are more readily applicable in a variety of scenarios. For example, rather than choosing one set decision threshold, they can be used in a Bayesian risk model to issue a run-time decision which minimizes a user-specified cost function dynamically chosen at prediction time. However, the quality of the probability estimates is crucial. We review a variety of standard approaches to converting scores (and poor probability estimates) from text classifiers to high quality estimates and introduce new models motivated by the intuition that the empirical score distribution for the "extremely irrelevant", "hard to discriminate", and "obviously relevant" items are often significantly different. Finally, we analyze the experimental performance of these models over the outputs of two text classifiers. The analysis demonstrates that one of these models is theoretically attractive (introducing few new parameters while increasing flexibility), computationally efficient, and empirically preferable.



Reducing Boundary Friction Using Translation-Fragment Overlap (pdf)
Ralf Brown, Rebecca Hutchinson, Paul N. Bennett, Jaime Carbonell, Peter Jansen
Proceedings of the Machine Translation Summit IX, New Orleans, U.S.A., September 2003.
(Shares significant overlap in content with publication listing for CMU-CS-03-138, 2003.)

Many corpus-based Machine Translation (MT) systems generate a number of partial translations which are then pieced together rather than immediately producing one overall translation. While this makes them more robust to ill-formed input, they are subject to disfluencies at phrasal translation boundaries even for well-formed input. We address this "boundary friction" problem by introducing a method that exploits overlapping phrasal translations and the increased confidence in translation accuracy they imply. We specify an efficient algorithm for producing translations using overlap. Finally, our empirical analysis indicates that this approach produces higher quality translations than the standard method of combining non-overlapping fragments generated by our Example-Based MT (EBMT) system in a peak-to-peak comparison.



Maximal Lattice Overlap in Example-Based Machine Translation (pdf)
Rebecca Hutchinson, Paul N. Bennett, Jaime Carbonell, Peter Jansen, Ralf Brown
CMU-CS-03-138, Computer Science Department, School of Computer Science, Carnegie Mellon University, June 2003.
(Also listed as CMU-LTI-03-174. Shares significant overlap in content with publication listing for MT Summit IX, 2003.)

Example-Based Machine Translation (EBMT) retrieves pre-translated phrases from a sentence-aligned bilingual training corpus to translate new input sentences. EBMT uses long pre-translated phrases effectively but is subject to disfluencies at phrasal translation boundaries. We address this problem by introducing a novel method that exploits overlapping phrasal translations and the increased confidence in translation accuracy they imply. We specify an efficient algorithm for producing translations using overlap. Finally, our empirical analysis indicates that this approach produces higher quality translations than the standard method of EBMT in a peak-to-peak comparison.



Probabilistic Combination of Text Classifiers Using Reliability Indicators: Models and Results    (pdf)
Paul N. Bennett, Susan T. Dumais, Eric Horvitz
Proceedings of 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, August 2002. ACM Press.
(An extended and revised version of this work appears in Information Retrieval, 2005.)

The intuition that different text classifiers behave in qualitatively different ways has long motivated attempts to build a better metaclassifier via some combination of classifiers. We introduce a probabilistic method for combining classifiers that considers the context-sensitive reliabilities of contributing classifiers. The method harnesses reliability indicators --variables that provide a valuable signal about the performance of classifiers in different situations. We provide background, present procedures for building metaclassifiers that take into consideration both reliability indicators and classifier outputs, and review a set of comparative studies undertaken to evaluate the methodology.



Assessing the Calibration of Naive Bayes' Posterior Estimates (pdf)
Paul N. Bennett
CMU-CS-00-155, Computer Science Department, School of Computer Science, Carnegie Mellon University, September 2000.

In this paper, we give evidence that the posterior distribution of Naive Bayes goes to zero or one exponentially with document length. While exponential change may be expected as new bits of information are added, adding new words does not always correspond to new information. Essentially as a result of its independence assumption, the estimates grow too quickly. We investigate one parametric family that attempts to downweight the growth rate. The parameters of this family are estimated using a maximum likelihood scheme, and the results are evaluated.



Book Recommending Using Text Categorization with Extracted Information (pdf)
Raymond J. Mooney, Paul N. Bennett and Loriene Roy
Appeared in the AAAI-98/ICML-98 Workshop on Learning for Text Categorization and the AAAI-98 Workshop on Recommender Systems, Madison, WI, July 1998.

Content-based recommender systems suggest documents, items, and services to users based on learning a profile of the user from rated examples containing information about the given items. Text categorization methods are very useful for this task but generally rely on unstructured text. We have developed a book-recommending system that utilizes semi-structured information about items gathered from the web using simple information extraction techniques. Initial experimental results demonstrate that this approach can produce fairly accurate recommendations.



Text Categorization Through Probabilistic Learning: Applications to Recommender Systems (pdf)
Paul N. Bennett
Undergraduate Honor Thesis, Department of Computer Sciences, University of Texas at Austin, May 1998. Also appears as AI TR 98-270.

With the growth of the World Wide Web, recommender systems have received an increasing amount of attention. Many recommender systems in use today are based on collaborative filtering. This project has focused on LIBRA, a content-based book recommending system. By utilizing text categorization methods and the information available for each book, the system determines a user profile which is used as the basis of recommendations made to the user. Instead of the bag-of-words approach used in many other statistical text categorization approaches, LIBRA parses each text sample into a semi-structured representation. We have used standard Machine Learning techniques to analyze the performance of several algorithms on this learning task. In addition, we analyze the utility of several methods of feature construction and selection (i.e. methods of choosing the representation of an item that the learning algorithm actually uses). After analyzing the system we conclude that good recommendations are produced after a relatively small number of training examples. We also conclude that the feature selection method tested does not improve the performance of these algorithms in any systematic way, though the results indicate other feature selection methods may prove useful. Feature construction, however, while not providing a large increase in performance with the particular construction methods used here, holds promise of providing performance improvements for the algorithms investigated. This text assumes only minor familiarity with concepts of artificial intelligence and should be readable by the upper division computer science undergraduate familiar with basic concepts of probability theory and set theory.




Main Page

Paul N. Bennett
Last modified: Fri Jul 25 03:24:42 EDT 2000