Einat.M.

Einat Minkov

LTI, CS, Carnegie Mellon University


Email Datasets   Papers   CV


I am a PhD candidate at the Language Technologies Institute, CMU. My advisor is William W. Cohen @ MLD.
I'm a member of the RADAR and the CALO projects, and the PAL Learning Group.

I am interested in Information Extraction, Information Retrieval, Computational Lingustics and semantics. I am
also interested in Machine Learning - and the application of learning to NLP problems.


In my thesis, I consider (directed and weighted) graph representations of structural data, involving text as well
as other connected objects. Examples of such heterogenous graphs include citation networks, social networks
(including persons, events and other entities) and more; we represent personal information as an entity-relation
graph, in which email messages, meeting entries, social network information, text and a timeline are inter-connected
via relations derived from textual and structural information residing in a personal workstation or in a corporate database.

We derive an extended measure of similarity between the graph entities using random graph walks. We use this similarity
metric as a tool for performing search across the nodes in the graph. We then investigate learning methods in this
framework in order to improve the similarity measure for predefined sets of tasks. We evaluate methods that tune the
set of graph weights defined per edge type in the graph; we also propose re-ranking as an alternative and complementary
learning method, using features that capture "global" properties of the graph walk.

We have applied the framework of random walks and learning for various tasks in the personal information management
domain. We have shown that seemingly different tasks like person name disambiguation, threading and contextual search
can be addressed uniformly as search queries in this framework. Recently, we have applied this framework in the domain
of text processing, where the underlying graph represents a corpus as networked sentence dependency structures. Results
for the task of corpus-based coordinate term extraction exceeded the state-of-the-art, once learning was applied.

Slides of a recent talk about this research are available here.