Formal knowledge graphs enable sophisticated utilization of vast amounts of knowledge represented in a canonical way, but are typically limited to certain types of pre-defined structures and predicates. Open Information Extraction (Open-IE), on the other hand, aims to represent arbitrary information occurring in natural text. Yet, Open-IE extractions lack a consolidating canonical structure and are currently limited to rather simple predicate-argument tuples. In this talk I will outline a proposal for extending the unsupervised Open-IE paradigm towards a more powerful knowledge representation scheme, which could cover knowledge that falls beyond the typical scope of traditional knowledge graphs.
First, we propose capturing complex propositions that include multiple predicates, as well as extracting implied propositions and abstracting semantically-relevant information. Second, we propose to impose a structure over the set of extracted propositions via semantically-relevant relationships. In particular, we first focus on the entailment relation, which consolidates and effectively canonicalize semantically equivalent propositions and also induces a useful specific-to-general hierarchical structure. I will review initial research activities along the above mentioned goals, as well as aspects such as context-sensitive lexical inference and assessing factuality (truth assertion) of embedded propositions. If time permits, I will illustrate the appealing potential of such graphs for text exploration.
Faculty Host: Tom Mitchell
sharonw [atsymbol] cs.cmu.edu