Language Technologies Institute Presentation

  • Gates&Hillman Centers
  • Reddy Conference Room 4405

Linking Text with Knowledge, Data and Inference

The ultimate goal of Natural Language Understanding is to link sentences or text with abstract constructs which are not language bound. One may call such constructs "information", "concepts", "ontology", etc. Such "abstract" constructs have been not only given tangible and computable forms, but also their amounts in computable forms have become enormous. I will address in my talk (1) how one can leverage such resources to NLU in [Relation Expression Mining" and (2) what roles the structure of language can play in the linking process in [Paraphrase Recognition].

[Relation Expression Extraction] In many application domains as in the domain of Molecular Biology, we now have a large collection of non-textual data and more structured representation of information. We would like to leverage them to link expressions in language with relations which are defined independently of language. Conventional Distributional Semantics has tried to derive everything from text. It uses “co-occurrences” in text with arguments as the main source of information for capturing “semantics” of relational expressions. Instead, we use “co-occurrences” as mere auxiliary clues to relate two independently constructed semantic spaces, and the original semantic spaces need not be defined by text. The framework would open up the possibility of using external non-textual resources to define the semantic space of relational expressions in language.

[Paraphrase Recognition] Researches on paraphrase recognition have treated diverse types of paraphrases. One extreme is to define “paraphrases’ in the broadest sense which include logical relationships such as entailments. The other approach adopts a narrow definition of paraphrases and has used a set of sentence pairs in reference translations of MT as paraphrase pairs. We first focus on paraphrases of the latter type, and propose “Compositional Approach to Paraphrase” in which structure of language play the major role. Then, we discuss how to extend the framework by including logical inferences and to deal with broader ranges of paraphrases.


Before joining MSR (May, 2011), Dr. Jun'ichi Tsujii was Professor of Natural Language Processing in the Department of Computer Science, at University of Tokyo and Professor of Text Mining in School of Computer Science at University of Manchester, U.K. . He remains to be a scientific advisor of the UK National Centre for Text Mining (NaCTeM). His recent research achievements include (1) Recognition of Events in Molecular Biology for Construction of Bio-Pathways, (2) Construction of the gold standard corpus (GENIA) for Bio Text Mining, (3) Deep semantic parsing based on feature forest model, and (4) Improvement of estimator for maximum entropy model for HPSG parsing.

He was President of ACL (Association for Computational Linguistics, 2006), President of IAMT (International Association for Machine Translation, 2002-2004), and President of AFNLP (Asian Association for Natural Language Processing, 2007). He is a permanent member of the ICCL (International Committee for Computational Linguistics, 1992-), its Vice-Chair (2012-) and a program co-chair of Coling 2014 organized by the ICCL. He is a member of the advisory board of Institute of Information Science, Academia Sinica in Taiwan (2011-), a member of SIG:MA (Scientific Innovation Group: Mentors and Advisors), Elsevier Inc. (2012-), etc.

Faculty Host: Teruko Metamura

For More Information, Please Contact: