LTI SRS Home
      Call for Presentations
      Contacts
      Goals
     Topics of Interest
      Important Dates
      Guidelines
      Schedule
      Selected Abstracts
      2004
      2003



Abstracts 2005




Time: 9:00 am
Speaker: Matthew Bilotti
Title: Semantic Retrieval for Question Answering


Question Answering (QA) technologies are becoming more and more prevalent in everyday applications, such as customer service and technical support web sites.  Despite its increasing popularity, the basic architecture behind QA systems has been largely unchanged for several years.  The classic "pipelined" QA architecture consists of question analysis and query formulation, followed by document retrieval (relational or IR), and finally one or more filtering, extraction and presentation stages.  In such a QA system, the end-to-end system performance is a function of the performance of the worst module.  In the case of document retrieval, poor module performance can impair end-to-end system performance by failing to retrieve enough answer-bearing documents, or to rank them highly enough.

If we are to eventually improve retrieval performance for QA, we must capitalize on the wealth of semantic information extractable not only from documents in the collection, but also from natural language questions posed as input to the system.  Such content includes, but is not limited to, parse trees, semantic role labels, frames for events and scenarios, and links to ontologies.  Indexing and retrieving on linguistic and semantic content has the potential to greatly improve the quality of documents retrieved in response to an input question, providing the best possible results to downstream QA system modules, and contributing maximally to overall system performance.  It also has the feature that more of the burden of finding the correct answer is shifted onto the query formulation and document retrieval modules, minimizing costly post-processing by downstream filtering and extraction modules.  Of course, the trade-off is that pre-processing the text and building the index requires a great deal of resources, and there are issues scaling this process to large corpora.

In this talk, I briefly review some related work on semantic indexing and retrieval, both for its own sake and as applied to QA.  I then discuss JAVELIN's current approach to semantic retrieval for QA [1] using the Lemur toolkit [2], and describe current work as we move to using Indri for retrieval [3].  I discuss initial results from JAVELIN's participation in the TREC 2005 Relationship QA task, and from other preliminary experiments.  At the end of the talk, I present observations from JAVELIN's current foray into semantic retrieval and briefly discuss some open questions facing the effort.

References:

[1] Nyberg, et. al. "Extending the JAVELIN QA System with Domain Semantics", Proceedings of the 20th National Conference on Artificial Intelligence (AAAI 2005).

[2] The LTI/UMass CIIR Lemur Toolkit for Language Modeling and Information Retrieval.  http://www.lemurproject.org.

[3] Metzler and Croft. "Combining the Language Model and Inference Network Approaches to Retrieval," Information Processing and Management Special Issue on Bayesian Networks and Information Retrieval, 40(5), 735-750, 2004.





Time: 9:30 am
Speaker: Vitor Carvalho
Title: Implicit Query Systems for Email


Email is the number one activity people pursue online [1], with Internet Search a very close second. Given people’s interests and the increasingly profitable market for search engines, it makes sense to combine these two technologies as much as possible. Implicit Query systems for email automatically find relevant keywords (or keyphrases) in an email message, and use such phrases as queries in a search engine. This list of keywords is retrieved without the user explicitly searching for them, and the results can be displayed to the user in different ways, such as underlining the keyphrases as clickable links or showing them on search boxes in a sidebar.

One of the potential applications of such technology is providing content-driven automatic search in email systems, which could be used to improve the user experience/interface or even to automatically find related documents in a database (For instance, when viewing an email message, you might be shown other related messages). Another potential application is content-targeted advertisement in email messages, such as the one currently performed by Google’s Gmail system. Interestingly, content-targeted advertisement is an area of research in language technologies with very limited number of references available in the literature.

In this paper, we offer three contributions to implicit query research. First, we show how to use query logs from a search engine: by constraining results to commonly issued queries, we can get dramatic improvements in precision. Second, we describe a method for optimizing parameters for an implicit query system, by using logistic regression training. The method is designed to estimate the probability that any particular suggested keyphrase is a good one. This probability can be used for ranking of the best keyphrases and selecting the number of keyphrases to be shown to the user. Third, we show which features beyond standard TF-IDF features are most helpful in our logistic regression model: query frequency information, capitalization information, subject line information, message and query length information. Using the optimization method and the additional features, we are able to produce a system with up to 6 times better results on top-1 score than a simple TF-IDF system [2].

 
[1] Madden, Mary and Lee Rainie, “America’s Online Pursuits: the changing picture of  who’s online and what they do”. Pew Internet and American Life Project, December 2003.
(http://www.pewinternet.org/pdfs/PIP_Online_Pursuits_Final.PDF)

[2] Goodman, Joshua and Vitor R. Carvalho, “Implicit Queries for Email”, Conference on Email and Anti-Spam, Stanford, 2005.

http://www.cs.cmu.edu/~vitor/papers/ceas05.pdf.





Time: 10:30am
Speaker: Jaime Arguello
Title: InfoMagnets: Making Sense out of Corpus Data


In this talk, I will introduce a new interactive corpus exploration tool, InfoMagnets. InfoMagnets aims 
at making exploratory corpus analysis accessible to researchers not experts in text mining. To this end,
it introduces an intuitive visual metaphor to the task of document clustering. As evidence of its usefulness
and usability, it has been used successfully to uncover relationships between language and behavioral
patterns in two distinct domains: tutorial dialogue and on-line communities. In tutorial dialogue, the topic
analysis supported by InfoMagnets revealed differences between tutor-student interactions that explain
variance in learning gains as measured by a pre/post test. In on-line communities, the InfoMagnets-based
analysis furthered our understanding of behavioral patterns found within discussion threads, which moves
us one step closer to understanding why some on-line communities flourish while others die out.

 Our vision of InfoMagnets and its application poses three main challenges: user-interface design, topic detection and clustering, and context-based segmentation of dialogue. I will touch upon all three.

 Many interactive clustering visualization tools already exist [1, 2, 3, 4, and 5] (just to name a few).  InfoMagnet’s novelty comes from allowing the user to edit the resulting cluster centroids and document representation and immediately see the reorganization of documents. I will discuss how using a domain-relevant Latent Semantic Analysis (LSA) space enables this functionality and how this kind of instant, action-reaction, feedback helps the user understand relationships in the data faster.

 I will also discuss how we intend to use InfoMagnets as part of our suite of tools for authoring conversational interfaces, Tutalk. Our goal is to support the authoring of NLP-based interfaces from corpora of example human-human dialogue transcripts. A necessary step is to extract topic-based clusters from these dialogues. This, in turn, requires partitioning our sample dialogues into segments of granularity slightly coarser than Sinclair and Coulthard’s definition of a dialogue exchange [6]. I will touch upon how some state-of-the-art segmentation algorithms [7, 8] perform when applied to dialogue. This will lead to a description of our on-going work on a context-based segmenting algorithm that incorporates discourse cues, lexical cohesion measures, and automatically-induced dialogue act labels as features.

 If time permits, I will finish with a brief demo of InfoMagnets.

 
References:

 [1] Anton Leuski and James Allen. Lighthouse: Showing the Way to Relevant Information. In Proceedings of the IEEE Symposium on Information Visualization 2000.

 [2] Matt Rasmussen and George Karypis. gCLUTO: An Interactive Clustering, Visualization, and Analysis System. Technical Report # 04-021.

[3] James A. Wise, James J. Thomas, Kelly Pennock, David Lantrip, Marc Pottier, and Anne Schur.  Visualizing the Non-Visual: Spatial Analysis and Interaction with Information from Text Documents. In Proceedings of IEEE Information Visualization, pages 51-58, 1995.

[4] David Dubin. Document Analysis for Visualization. In Proceedings of ACM SIGIR, pages 199-204, 1995.

[5] Matthew Chalmers and Paul Chitson. Bead: Explorations in Information Visualization. In Proceedings of ACM SIGIR, pages 330-337, June 1992.

[6] Sinclair, J. and Coulthard, M. Toward an Analysis of Discourse: the English Used by Teachers and Pupils. Oxford University Press. 1975.

[7] Marti Hearst. TextTiling: A Quantitative Approach to Discourse Segmentation. Technical Report 93/24, 1993.

[8] Regina Barzilay and Lillian Lee. Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization. Proceedings of the NAACL/HLT, 2004.





Time: Time: 11:00 am
Speaker: Pinar Donmez
Title: Strategically Using Pairwise Classification to Improve Category Prediction


Text classification is a research topic that is widely studied in information retrieval, machine learning, and related language technologies. While text classification tasks that consist mainly of binary distinctions have achieved very high levels of accuracy, many multi-label classification problems, especially with skewed datasets, are still hard to solve. For example, in our work with collaborative learning process analyses, we have trained classifiers to apply coding schemes with as many as 53 different codes, some of which occur less than 1% of the time [1]. Methods such as classification by pairwise coupling [2], boosting [4], error correcting output codes [3], Bayesian models have been developed for multi-label text classification. They all create multiple models on the training data. But, they differ on how they combine the predictions of individual models.    

One straightforward way for multi-label categorization is to create binary classifiers for each pair of classes and then predict the class that has the majority of votes from pairwise models. But, it takes an intolerable amount of time when the number of classes is large. I will present a new method that builds one-vs-all classifiers recursively according to an analysis based on the distribution of errors. The idea of recursively applying sub-classifiers is to pay closer attention to the classes that are mostly confusable with each other. One similar method is boosting, where each time the weights of misclassified examples are increased to penalize the inaccurate models and vice versa. In our method, we create sub-classifiers at each iteration by choosing examples from one class along with examples that are mistakenly classified as that chosen class. This approach has two implications: First, since we build sub-classifiers these subsets will be smaller than the whole set and hopefully it will reduce our training time. Because they are focused sets, we may be able to achieve high performance despite the small size. Nevertheless, the second implication is that since we choose the hard classes to distinguish using the sub-classifiers, the models may not be good enough to distinguish between those classes, so one concern is that the sub-classifiers may decrease the performance as we go deep in recursion. Thus, our method also gives us the opportunity to analyze the errors in detail and how classes affect each other when combined in a smaller, more focused set. The challenge of our approach is to find an optimal balance between these two implications.

 
 References:     

 [1] Donmez, P., Rose, C. P., Stegmann, K., Weinberger, A., and Fischer, F. Supporting CSCL with Automatic Corpus Analysis Technology, in the Proceedings of Computer Supported Collaborative Learning, 2005. (Nominated for Best Paper Award)

 [2] Hastie, T.J. and Tibshirani, R.J. Classification by Pairwise Coupling, in Advances in Neural Information Processing Systems, vol. 10. MIT Press, 1998. 

 [3] Dietterich, T. and Bakiri, G. Solving Multiclass Learning Problems via Error-Correcting Output Codes, in the Journal of Artificial Intelligence Research, 2:263-86, 1995.

[4] Schaphire, R.E. A Brief Introduction to Boosting, in the Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 1999.





Time: 11:30 am
Speaker: Christina Bennett
Title: Large Scale Evaluation of Corpus-based Synthesizers: The Blizzard Challenge 2005


Evaluation is a necessary component of every research area.  In the field of speech synthesis, several testing methods have been commonly used; however, when conducting evaluations, it is rare to compare voices developed using different systems.  Most evaluation has been within a single research group for diagnostic testing or speaker selection.  While a cross-system evaluation is obviously of value, results are not easily comparable if the data used to build the voices comes from different sources.  The Blizzard Challenge sought to eliminate this problem by specifying speech corpora on which all participating systems build voices, allowing for a meaningful comparison between sites and techniques. 
A large scale international evaluation of various corpus-based speech synthesis systems using common datasets, the Blizzard Challenge was hosted by Carnegie Mellon University and conducted from February through April, 2005.  Six sites from around the world, both academic and industrial, participated in this evaluation, the first ever to compare voices built by different systems using the same data. 

Three types of listeners were sought, and ultimately totaled nearly 200 participants who had completed all portions of the evaluation.  One system, an hmm-based speech synthesizer, clearly outshined its competition.  Overall it was found to be both the most preferred (highest in opinion scores) and best understood (lowest in error) among all listener types, while results from other systems varied.  The Blizzard Challenge has been highly successful and has inspired the speech synthesis community to continue to make strides in the area of evaluation, and as such, we are hopeful that the Challenge will become an annual or biannual event.  A special session devoted to the Blizzard Challenge 2005 was held on September 5th at this year's Interspeech conference in Lisbon, Portugal.

In the second portion of the talk, I will discuss how this challenge has refined our collective knowledge of how speech synthesis evaluations should be conducted and my ongoing research in this area.  The foremost questions in need of resolution are what types of tests should be used, how to construct effective evaluations for multiple sites, and various other considerations related to evaluation. In my presentation, I will describe the organization and design of the Blizzard Challenge with special focus on a discussion of results and lessons learned from conducting this first-of-its-kind evaluation.

 
Bennett, C., “Large Scale Evaluation of Corpus-based Synthesizers: Results and Lessons from the Blizzard Challenge 2005,” to appear In Proceedings of Interspeech 2005 – Eurospeech, Lisbon, Portugal, 2005.

Black, A. and Tokuda, K., “The Blizzard Challenge – 2005: evaluating corpus-based speech synthesis on common databases.” to appear In Proceedings of Interspeech 2005 – Eurospeech, Lisbon, Portugal, 2005.   http://www.festvox.org/blizzard




Time: 2:00 pm
Speaker: Wei-Hao Lin
Title:
Which Side are You on?  Identifying Document-Level and Sentence-Level Perspectives Using Statistical Models


In this talk we investigate the problem of identifying the perspective from which a document was written.  By perspective we mean a point of view, for example, from the perspective of Democrats or Southerners.  An intelligence analyst may regularly monitor the positions that foreign countries take on various issues.  A media analyst could frequently survey broadcast news and newspapers for different viewpoints.  What these analysts have in common is that they would like to find evidence of strong statements of differing perspectives, while ignoring neutral statements as less interesting.  Can a computer algorithm learn to tell the perspective of a document?  Moreover, can we develop a system to identify the ``neutral'' sentences in a document, i.e. sentences that do not indicate a particular perspective?

We evaluate different statistical learning algorithms and find that the document-level perspective can be predicted well from word usage over the whole document.  Sentence-level perspective identification, however, is more challenging due to the lack of context and sentence-level annotations.  To overcome this obstacle, we propose a new statistical model, called the latent perspective model (LPM), to learn the perspective of individual sentences without labels at the sentence level.  The results show that LPM can successfully recover the implicit sentence-level perspectives and achieve higher identification accuracy at the sentence level compared to a model that does not account for perspective-neutral sentences.





Time: 2:30 pm
Speaker: Kenji Sagae
Title:
Automatic Measurement of Syntactic Development in Child Language


To facilitate the use of syntactic information in the study of child language acquisition, a coding scheme for grammatical relations (such as subject, object, adjunct, etc.) in transcripts of parent-child dialogs has been proposed by Sagae, MacWhinney and Lavie (2004). We discuss the use of current NLP techniques to produce grammatical relations (GRs) in this annotation scheme. By using a statistical parser (Charniak, 2000) and memory-based learning tools for classification (Daelemans et al., 2004), we obtain high precision and recall of several GRs. We demonstrate the usefulness of this approach by performing automatic measurements of syntactic development with the Index of Productive Syntax (Scarborough, 1990) at similar levels to what child language researchers compute manually.

The Index of Productive Syntax (IPSyn) is one of the most important measures of child language development. It provides a numerical score for grammatical complexity in a corpus of transcribed child utterances. IPSyn scores can be used for investigating individual or group differences in child language acquisition, in either research or clinical settings. Computation of IPSyn scores has traditionally been a laborious process that requires manual identification of several syntactic structures. In this talk I will present an NLP system that performs automatic computation of IPSyn scores, and an evaluation of this system using real data from child language research labs.





Time: 3:30 pm
Speaker: Pradipta Ray
Title:
Motif Finding across Species


The problem of computational identification of repeating, roughly conserved patterns of nucleotides having biological significance, or "sequence motifs" is an important one.  Such techniques can locate regulatory regions of the genome and help construct organism - specific regulatory networks of the genome, provide a basic understanding of protein structure and transcription factors.

Traditional motif finding techniques use expectation maximization or sampling to detect motif elements. Even though such techniques are capable of de novo motif detection, they tend to generate a large number of false positives.

A new approach to motif detection by Haussler et al, uses comparative genomics and phylogenetic information to detect motifs. Corresponding sequences in related species are multiply aligned using standard techniques, and then a Hidden Markov Model is used to functionally annotate each site based on the likelihoods of transition between different functional entities, and the likelihoods of each site having been generated from the phylogenetic tree corresponding to a functional entity.  The phylogenetic tree is modelled as a continous time Markov Model with 4 states.

While successful, this model implicitly assumes that corresponding sites in related species have the same biological function, whereas there is strong biological evidence to the contrary in the form of motif instances not being conserved across species. We modify the above model to use a mixture of trees model at each site, depending on a site specific and species specific annotation of the multiply aligned sequences. We develop an algorithm for a ML framework for such computations, and are in the process of testing it with real biological data from Drosophilae family.





Time: 4:00 pm
Speaker: Yifen Huang
Title:
Infering Ongoing Activities of Workstation Users by Clustering and User's Feedback


We are interested in automatically discovering the key ongoing activities of a workstation user, such as committees to which she belongs, research projects in which she is involved, etc., based on the contents of her workstation. The thesis underlying our research is that this collection of user activities can be automatically inferred from the variety of data available on most usersworkstations, including their emails, files, online calendar, and history of web page accesses. If such activities could be inferred, this knowledge about the user's activities could be used in a variety of ways to support the user. For example, it could be used to cross-index email, calendar events, files, and web accesses according to activity, or to produce a 'briefing folder' for each meeting on the user's calendar (i.e., a folder containing emails, files, etc., relevant to the activity associated with this meeting).

We have developed a three-step algorithm to induce a user’s ongoing activities based on her email collection and the user’s feedback on activity representations. The first step includes a variety of clustering algorithms and social network analysis. The goal is to utilize the specific properties of emails in order to produce unsupervised cluster labels for each email. The second step applies various information extraction techniques to generate activity summarization for each cluster, such as a set of keywords for the activity, primary senders, extracted names and dates, request emails, etc. The last step is to collect the user’s feedback on the automatic generated activity representation and then re-train the activity model accordingly. Most machine learning algorithms can adapt to feedback about labels but less work has been done on the feedback about features. Our algorithm introduces a new hidden variable which indicates that a feature is generated from a specific activity or a general topic which is not related to any activities, so that the feedback can be integrated into the model by adjusting probability weightings in the EM process. 

Table 1 shows the folder alignment results of various clustering methods; we can achieve 50% accuracy or above on our best method.

Table 1

 Figure 1 shows an example of the activity representation.

Figure 1

 We will also show the interface to collect users’ feedback. Currently we are conducting the experiment to evaluate the algorithm for adapting to users’ feedback.