|
|
|
 |
|
Abstracts
2005
|
|
|
|
|
| Time: |
9:00 am
|
| Speaker: |
Matthew
Bilotti
|
| Title:
|
Semantic Retrieval for Question Answering
|
|
|
|
Question Answering (QA)
technologies are becoming more and more prevalent in everyday
applications,
such as customer service and technical support web sites.
Despite its increasing popularity, the basic
architecture behind QA systems has been largely unchanged for several
years. The classic "pipelined"
QA architecture consists of question analysis and query formulation,
followed
by document retrieval (relational or IR), and finally one or more
filtering,
extraction and presentation stages. In
such a QA system, the end-to-end system performance is a function of
the
performance of the worst module. In the
case of document retrieval, poor module performance can impair
end-to-end
system performance by failing to retrieve enough answer-bearing
documents, or
to rank them highly enough.
If we are to eventually
improve retrieval performance for QA, we must capitalize on the wealth
of
semantic information extractable not only from documents in the
collection, but
also from natural language questions posed as input to the system. Such content includes, but is not limited to,
parse trees, semantic role labels, frames for events and scenarios, and
links
to ontologies. Indexing and retrieving
on linguistic and semantic content has the potential to greatly improve
the
quality of documents retrieved in response to an input question,
providing the
best possible results to downstream QA system modules, and contributing
maximally to overall system performance. It
also has the feature that more of the
burden of finding the correct
answer is shifted onto the query formulation and document retrieval
modules,
minimizing costly post-processing by downstream filtering and
extraction
modules. Of course, the trade-off is
that pre-processing the text and building the index requires a great
deal of
resources, and there are issues scaling this process to large corpora.
In this talk, I briefly
review some related work on semantic indexing and retrieval, both for
its own
sake and as applied to QA. I then
discuss JAVELIN's current approach to semantic retrieval for QA [1]
using the
Lemur toolkit [2], and describe current work as we move to using Indri
for
retrieval [3]. I discuss initial results
from JAVELIN's participation in the TREC 2005 Relationship QA task, and
from
other preliminary experiments. At the
end of the talk, I present observations from JAVELIN's current foray
into
semantic retrieval and briefly discuss some open questions facing the
effort.
References:
[1] Nyberg, et. al.
"Extending the JAVELIN QA System with Domain Semantics", Proceedings
of the 20th National Conference on Artificial Intelligence (AAAI 2005).
[2] The LTI/UMass CIIR Lemur
Toolkit for Language Modeling and Information Retrieval.
http://www.lemurproject.org.
[3] Metzler and Croft.
"Combining the Language Model and Inference Network Approaches to
Retrieval," Information Processing and Management Special Issue on
Bayesian Networks and Information Retrieval, 40(5), 735-750, 2004.
|
|
|
|
|
| Time:
|
9:30 am
|
| Speaker:
|
Vitor Carvalho
|
| Title:
|
Implicit Query
Systems for Email
|
|
|
|
Email is the number one activity people pursue online [1],
with Internet Search a very close second. Given people’s interests and
the
increasingly profitable market for search engines, it makes sense to
combine
these two technologies as much as possible. Implicit
Query systems for email automatically find relevant keywords (or keyphrases) in an email message, and use
such phrases as queries in a search engine. This list of keywords is
retrieved without
the user explicitly searching for them, and the results can be
displayed to the
user in different ways, such as underlining the keyphrases as clickable
links
or showing them on search boxes in a sidebar.
One of the potential
applications of such technology is
providing content-driven automatic search in email systems, which could
be used
to improve the user experience/interface or even to automatically find
related
documents in a database (For instance, when viewing an email message,
you might
be shown other related messages). Another potential application is
content-targeted advertisement in email messages, such as the one
currently
performed by Google’s Gmail system. Interestingly, content-targeted
advertisement is an area of research in language technologies with very
limited
number of references available in the literature.
In this paper, we
offer three contributions to implicit query research. First, we show
how to use
query logs from a search engine: by constraining results to commonly
issued
queries, we can get dramatic improvements in precision. Second, we
describe a
method for optimizing parameters for an implicit query system, by using
logistic regression training. The method is designed to estimate the
probability that any particular suggested
keyphrase is a good one. This
probability can be used for ranking of the best keyphrases and
selecting the
number of keyphrases to be shown to the user. Third, we show which
features
beyond standard TF-IDF features are most helpful in our logistic
regression
model: query frequency information, capitalization information, subject
line
information, message and query length information. Using the
optimization
method and the additional features, we are able to produce a system
with up to
6 times better results on top-1 score than a simple TF-IDF system [2].
[1] Madden, Mary and Lee
Rainie,
“America’s Online
Pursuits: the changing picture of who’s
online and what they do”. Pew Internet and
American Life Project,
December 2003. (http://www.pewinternet.org/pdfs/PIP_Online_Pursuits_Final.PDF)
[2] Goodman,
Joshua and Vitor R. Carvalho, “Implicit Queries for Email”,
Conference
on Email and Anti-Spam, Stanford, 2005.
http://www.cs.cmu.edu/~vitor/papers/ceas05.pdf.
|
|
|
|
|
| Time:
|
10:30am
|
| Speaker:
|
Jaime
Arguello
|
| Title: |
InfoMagnets: Making
Sense out of Corpus Data
|
|
|
In this talk, I will introduce a new interactive corpus exploration tool, InfoMagnets. InfoMagnets aims at making exploratory corpus analysis accessible to researchers not experts in text mining. To this end, it introduces an intuitive visual metaphor to the task of document clustering. As evidence of its usefulness and usability, it has been used successfully to uncover relationships between language and behavioral patterns in two distinct domains: tutorial dialogue and on-line communities. In tutorial dialogue, the topic analysis supported by InfoMagnets revealed differences between tutor-student interactions that explain variance in learning gains as measured by a pre/post test. In on-line communities, the InfoMagnets-based analysis furthered our understanding of behavioral patterns found within discussion threads, which moves us one step closer to understanding why some on-line communities flourish while others die out.
Our vision of InfoMagnets and its
application poses three
main challenges: user-interface design, topic detection and clustering,
and
context-based segmentation of dialogue. I will touch upon all three.
Many interactive clustering visualization
tools already
exist [1, 2, 3, 4, and 5] (just to name a few). InfoMagnet’s
novelty comes from allowing the user
to edit the resulting cluster centroids and document representation and
immediately see the reorganization of documents. I will discuss how
using a
domain-relevant Latent Semantic Analysis (LSA) space enables this
functionality
and how this kind of instant, action-reaction, feedback helps the user
understand
relationships in the data faster.
I will also discuss how we intend to use
InfoMagnets as part
of our suite of tools for authoring conversational interfaces, Tutalk.
Our goal
is to support the authoring of NLP-based interfaces from corpora of
example
human-human dialogue transcripts. A necessary step is to extract
topic-based
clusters from these dialogues. This, in turn, requires partitioning our
sample
dialogues into segments of granularity slightly coarser than Sinclair
and
Coulthard’s definition of a dialogue exchange [6]. I will touch upon
how some state-of-the-art
segmentation algorithms [7, 8] perform when applied to dialogue. This
will lead
to a description of our on-going work on a context-based segmenting
algorithm
that incorporates discourse cues, lexical cohesion measures, and
automatically-induced
dialogue act labels as features.
If time permits, I will finish with a brief
demo of
InfoMagnets.
References:
[1] Anton Leuski and James Allen. Lighthouse: Showing the Way to Relevant Information. In Proceedings of the IEEE Symposium on Information
Visualization 2000.
[2] Matt
Rasmussen and George Karypis. gCLUTO: An
Interactive
Clustering, Visualization, and Analysis System. Technical
Report #
04-021.
[3] James A. Wise, James J. Thomas, Kelly Pennock, David
Lantrip, Marc Pottier, and Anne Schur. Visualizing the Non-Visual: Spatial Analysis
and Interaction with Information from Text Documents. In
Proceedings of
IEEE Information Visualization, pages 51-58, 1995.
[4] David Dubin. Document
Analysis for Visualization. In Proceedings of ACM SIGIR, pages
199-204,
1995.
[5] Matthew Chalmers and Paul Chitson. Bead:
Explorations in Information Visualization. In Proceedings of
ACM SIGIR, pages 330-337, June 1992.
[6] Sinclair,
J. and Coulthard, M. Toward an Analysis
of Discourse: the English Used by Teachers and Pupils. Oxford University
Press. 1975.
[7] Marti
Hearst. TextTiling: A Quantitative Approach
to Discourse Segmentation. Technical Report 93/24, 1993.
[8] Regina
Barzilay and Lillian Lee. Catching the Drift: Probabilistic Content
Models,
with Applications to Generation and Summarization. Proceedings of
the
NAACL/HLT, 2004.
|
|
|
|
|
| Time:
|
Time: 11:00 am
|
| Speaker:
|
Pinar Donmez
|
| Title: |
Strategically Using
Pairwise Classification to Improve Category Prediction
|
|
|
|
Text classification is a research
topic that is widely studied in information retrieval, machine
learning, and
related language technologies. While text classification tasks that
consist
mainly of binary distinctions have achieved very high levels of
accuracy, many
multi-label classification problems, especially with skewed datasets,
are still
hard to solve. For example, in our work with collaborative learning
process
analyses, we have trained classifiers to apply coding schemes with as
many as
53 different codes, some of which occur less than 1% of the time [1].
Methods
such as classification by pairwise coupling [2], boosting [4], error
correcting
output codes [3], Bayesian models have been developed for multi-label
text
classification. They all create multiple models on the training data.
But, they
differ on how they combine the predictions of individual models.
One straightforward way for multi-label
categorization is to create binary classifiers for each pair of classes
and
then predict the class that has the majority of votes from pairwise
models. But,
it takes an intolerable amount of time when the number of classes is
large. I
will present a new method that builds one-vs-all classifiers
recursively
according to an analysis based on the distribution of errors. The idea
of
recursively applying sub-classifiers is to pay closer attention to the
classes
that are mostly confusable with each other. One similar method is
boosting,
where each time the weights of misclassified examples are increased to
penalize
the inaccurate models and vice versa. In our method, we create
sub-classifiers
at each iteration by choosing examples from one class along with
examples that
are mistakenly classified as that chosen class. This approach has two
implications: First, since we build sub-classifiers these subsets will
be
smaller than the whole set and hopefully it will reduce our training
time.
Because they are focused sets, we may be able to achieve high
performance
despite the small size. Nevertheless, the second implication is that
since we
choose the hard classes to distinguish using the sub-classifiers, the
models
may not be good enough to distinguish between those classes, so one
concern is
that the sub-classifiers may decrease the performance as we go deep in
recursion. Thus, our method also gives us the opportunity to analyze
the errors
in detail and how classes affect each other when combined in a smaller,
more
focused set. The challenge of our approach is to find an optimal
balance
between these two implications.
References:
[1]
Donmez, P., Rose, C. P., Stegmann, K., Weinberger, A.,
and Fischer, F. Supporting CSCL with Automatic Corpus Analysis
Technology, in
the Proceedings of Computer Supported Collaborative Learning, 2005.
(Nominated
for Best Paper Award)
[2]
Hastie, T.J. and Tibshirani, R.J. Classification by
Pairwise Coupling, in Advances in Neural Information Processing
Systems, vol.
10. MIT Press, 1998.
[3]
Dietterich, T. and Bakiri, G. Solving Multiclass
Learning Problems via Error-Correcting Output Codes, in the Journal of
Artificial Intelligence Research, 2:263-86, 1995.
[4] Schaphire,
R.E. A Brief Introduction to Boosting, in the
Proceedings of the Sixteenth International Joint Conference on
Artificial
Intelligence, 1999.
|
|
|
|
|
| Time:
|
11:30 am
|
| Speaker:
|
Christina
Bennett
|
| Title:
|
Large
Scale
Evaluation of Corpus-based Synthesizers: The Blizzard Challenge 2005
|
|
|
|
Evaluation
is a necessary
component of every research area. In the
field of speech synthesis, several testing methods have been commonly
used;
however, when conducting evaluations, it is rare to compare voices
developed
using different systems. Most evaluation
has been within a single research group for diagnostic testing or
speaker
selection. While a cross-system
evaluation is obviously of value, results are not easily comparable if
the data
used to build the voices comes from different sources.
The Blizzard Challenge sought to eliminate
this problem by specifying speech corpora on which all participating
systems
build voices, allowing for a meaningful comparison between sites and
techniques.
A
large scale international evaluation of various corpus-based speech
synthesis
systems using common datasets, the Blizzard Challenge was hosted by Carnegie Mellon University
and conducted from February through April, 2005. Six
sites from around the world, both
academic and industrial, participated in this evaluation, the first
ever to
compare voices built by different systems using the same data.
Three
types of listeners were sought, and ultimately totaled nearly 200
participants
who had completed all portions of the evaluation. One
system, an hmm-based speech synthesizer, clearly
outshined its competition. Overall it
was found to be both the most preferred (highest in opinion scores) and
best
understood (lowest in error) among all listener types, while results
from other
systems varied. The Blizzard Challenge
has been highly successful and has inspired the speech synthesis
community to
continue to make strides in the area of evaluation, and as such, we are
hopeful
that the Challenge will become an annual or biannual event. A special session devoted to the Blizzard
Challenge 2005 was held on September 5th at this year's Interspeech
conference
in Lisbon, Portugal.
In
the second portion of the talk, I will discuss how this challenge has
refined
our collective knowledge of how speech synthesis evaluations should be
conducted and my ongoing research in this area. The
foremost questions in need of resolution
are what types of tests
should be used, how to construct effective evaluations for multiple
sites, and
various other considerations related to evaluation. In my presentation,
I will
describe the organization and design of the Blizzard Challenge with
special
focus on a discussion of results and lessons learned from conducting
this
first-of-its-kind evaluation.
Bennett, C.,
“Large Scale Evaluation of Corpus-based Synthesizers: Results and
Lessons from
the Blizzard Challenge 2005,” to appear In Proceedings of
Interspeech 2005 –
Eurospeech, Lisbon,
Portugal,
2005.
Black, A. and
Tokuda, K., “The Blizzard Challenge –
2005: evaluating corpus-based speech synthesis on common databases.” to
appear In
Proceedings of Interspeech 2005 – Eurospeech, Lisbon, Portugal,
2005. http://www.festvox.org/blizzard |
|
|
|
|
| Time:
|
2:00 pm
|
| Speaker:
|
Wei-Hao
Lin
|
Title:
|
Which
Side are You
on? Identifying Document-Level and Sentence-Level Perspectives
Using Statistical Models
|
|
|
|
In this talk we investigate
the problem of identifying the perspective from which a document was
written. By perspective we mean a point
of view, for example, from the perspective of Democrats or Southerners. An intelligence analyst may regularly monitor
the positions that foreign countries take on various issues. A media analyst could frequently survey
broadcast news and newspapers for different viewpoints.
What these analysts have in common is that
they would like to find evidence of strong statements of differing
perspectives, while ignoring neutral statements as less interesting.
Can a computer algorithm
learn to tell the perspective of a document? Moreover,
can we develop a system to identify
the ``neutral'' sentences
in a document, i.e. sentences that do not indicate a particular
perspective?
We evaluate different
statistical learning algorithms and find that the document-level
perspective
can be predicted well from word usage over the whole document.
Sentence-level perspective identification,
however, is more challenging due to the lack of context and
sentence-level
annotations. To overcome this obstacle,
we propose a new statistical model, called the latent perspective model
(LPM),
to learn the perspective of individual sentences without labels at the
sentence
level. The results show that LPM can
successfully recover the implicit sentence-level perspectives and
achieve
higher identification accuracy at the sentence level compared to a
model that
does not account for perspective-neutral sentences.
|
|
|
|
|
| Time:
|
2:30 pm
|
| Speaker:
|
Kenji
Sagae
|
Title:
|
Automatic
Measurement of Syntactic Development in Child Language
|
|
|
To
facilitate the use of syntactic information in the study of child
language acquisition, a coding scheme for grammatical relations (such
as subject, object, adjunct, etc.) in transcripts of parent-child
dialogs has been proposed by Sagae, MacWhinney and Lavie (2004). We
discuss the use of current NLP techniques to produce grammatical
relations (GRs) in this annotation scheme. By using a statistical
parser (Charniak, 2000) and memory-based learning tools for
classification (Daelemans et al., 2004), we obtain high precision and
recall of several GRs. We demonstrate the usefulness of this approach
by performing automatic measurements of syntactic development with the
Index of Productive Syntax (Scarborough, 1990) at similar levels to
what child language researchers compute manually.
The Index of Productive Syntax (IPSyn) is one of the most important
measures of child language development. It provides a numerical score
for grammatical complexity in a corpus of transcribed child utterances.
IPSyn scores can be used for investigating individual or group
differences in child language acquisition, in either research or
clinical settings. Computation of IPSyn scores has traditionally been a
laborious process that requires manual identification of several
syntactic structures. In this talk I will present an NLP system that
performs automatic computation of IPSyn scores, and an evaluation of
this system using real data from child language research labs.
|
|
|
|
|
| Time:
|
3:30 pm
|
| Speaker:
|
Pradipta
Ray
|
Title:
|
Motif
Finding across
Species
|
|
|
|
The problem of
computational identification of repeating,
roughly conserved patterns of nucleotides having biological
significance, or
"sequence motifs" is an important one. Such techniques can locate
regulatory regions of the genome and help construct organism
- specific regulatory networks of the genome, provide a basic
understanding of
protein structure and transcription factors.
Traditional
motif finding techniques use expectation maximization or sampling to
detect
motif elements. Even though such techniques are capable of de novo
motif
detection, they tend to generate a large number of false positives.
A
new
approach to motif detection by Haussler et al, uses comparative
genomics and phylogenetic
information to detect motifs. Corresponding sequences in related
species are
multiply aligned using standard techniques, and then a Hidden Markov
Model is
used to functionally annotate each site based on the likelihoods of
transition
between different functional entities, and the likelihoods of each site
having
been generated from the phylogenetic tree corresponding to a functional
entity. The phylogenetic tree is modelled as a continous time
Markov
Model with 4 states.
While
successful, this model implicitly assumes that corresponding sites in
related
species have the same biological function, whereas there is strong
biological
evidence to the contrary in the form of motif instances not being
conserved
across species. We modify the above model to use a mixture of trees
model at
each site, depending on a site specific and species specific annotation
of the
multiply aligned sequences. We develop an algorithm for a ML framework
for such
computations, and are in the process of testing it with real biological
data
from Drosophilae family.
|
|
|
|
|
| Time: |
4:00 pm
|
| Speaker:
|
Yifen
Huang
|
Title:
|
Infering
Ongoing
Activities of Workstation Users by Clustering and User's Feedback
|
|
|
|
We are
interested in
automatically discovering the key ongoing activities of a workstation
user,
such as committees to which she belongs, research projects in which she
is
involved, etc., based on the contents of her workstation. The thesis
underlying
our research is that this collection of user activities can be
automatically inferred
from the variety of data available on most users’
workstations, including their
emails, files,
online calendar, and history of web page accesses. If such activities
could be
inferred, this knowledge about the user's activities could be used in a
variety
of ways to support the user. For example, it could be used to
cross-index
email, calendar events, files, and web accesses according to activity,
or to
produce a 'briefing folder' for each meeting on the user's calendar
(i.e., a
folder containing emails, files, etc., relevant to the activity
associated with
this meeting).
We have
developed a
three-step algorithm to induce a user’s ongoing activities based on her
email
collection and the user’s feedback on activity representations. The
first step
includes a variety of clustering algorithms and social network
analysis. The
goal is to utilize the specific properties of emails in order to
produce
unsupervised cluster labels for each email. The second step applies
various
information extraction techniques to generate activity summarization
for each
cluster, such as a set of keywords for the activity, primary senders,
extracted
names and dates, request emails, etc. The last step is to collect the
user’s
feedback on the automatic generated activity representation and then
re-train
the activity model accordingly. Most machine learning algorithms can
adapt to
feedback about labels but less work has been done on the feedback about
features. Our algorithm introduces a new hidden variable which
indicates that a
feature is generated from a specific activity or a general topic which
is not
related to any activities, so that the feedback can be integrated into
the
model by adjusting probability weightings in the EM process.
Table 1 shows the folder
alignment results of various
clustering methods; we can achieve 50% accuracy or above on our best
method.

Figure 1
shows an example of the activity representation.

We will
also show the interface to collect users’ feedback.
Currently we are conducting the experiment to evaluate the algorithm
for
adapting to users’ feedback.
|
|
|