I am interested in Information Extraction, Information Retrieval, Computational Lingustics and semantics. I am also interested in Machine Learning - and the application of learning to NLP problems.
In my thesis, I consider (directed and weighted) graph representations of structural data, involving text as well
as other connected objects. Examples of such heterogenous graphs include citation networks, social networks (including persons, events and other entities) and more; we represent personal information as an entity-relation graph, in which email messages, meeting entries, social network information, text and a timeline are inter-connected via relations derived from textual and structural information residing in a personal workstation or in a corporate database.
We derive an extended measure of similarity between the graph entities using random graph walks. We use this similarity metric as a tool for performing search across the nodes in the graph. We then investigate learning methods in this framework in order to improve the similarity measure for predefined sets of tasks. We evaluate methods that tune the set of graph weights defined per edge type in the graph; we also propose re-ranking as an alternative and complementary learning method, using features that capture "global" properties of the graph walk.
We have applied the framework of random walks and learning for various tasks in the personal information management
domain. We have shown that seemingly different tasks like person name disambiguation, threading and contextual search can be addressed uniformly as search queries in this framework. Recently, we have applied this framework in the domain of text processing, where the underlying graph represents a corpus as networked sentence dependency structures. Results
for the task of corpus-based coordinate term extraction exceeded the state-of-the-art, once learning was applied.
Slides of a recent talk about this research are available here.