Semantic role labeling for eventive nominalizations Justin Betteridge and Matthew Bilotti Traditional semantic role labeling systems such as ASSERT (Pradhan, et. al., 2005) extract noun phrase arguments from sentences and link them to their target verb using semantic roles. Here is an example of a hypothetical sentence about a Biology faculty member, Professor Smith. In this case, PropBank (Kingsbury, et. al.) roles are shown, since that is the type of roles ASSERT is trained to produce. Professor Smith earned her Ph.D. from Prestigious University. [ARG0 Professor Smith] [TARGET earned ] [ARG1 her Ph.D.] [ARG2 from Prestigious University.] What happens if the sentence happened to be written this way instead? Prestigious University announced that Professor Smith's name was among those selected for conferral of the Ph.D. in Molecular Biology. ASSERT and similar systems would analyze this sentence as shown: [ARG0 Prestigious University] [TARGET announced ] that [ARG1 Smith's name was among those selected for conferral of the Ph.D. in Molecular Biology.] Many semantic role labeling systems, including ASSERT, do not provide bracketings when a form of have or be is the target verb. In addition, because the fact that the Ph.D. was conferred upon Smith by Prestigious University is expressed using a nominalization and not a verb, it is not labeled. We propose to build a system that can cover this gap in coverage for nominalizations. For the example above, our system would extract: EVENT: conferral ARG0: Prestigious University ARG1: Ph.D. in Molecular Biology ARG2: Smith When combined with a traditional semantic role labeler, our system will greatly improve coverage of event frames extracted from text. Training Procedure: We will be training on a collection of chunked noun phrases that contain eventive nominals, as identified in Nomlex, extracted from the AQUAINT corpus. We will learn a way to identify substructure in the noun phrase, such as by identifying that in the phrase "conferral of the Ph.D. in Molecular Biology" the argument "the Ph.D. in Molecular Biology" appears as the PP-OBJ of a PP headed by "of". From this structural analysis of noun phrases, we will learn a mapping up into Nomlex, and from there, can access the verb frame information for the corresponding verb by existing cross-references into VerbNet. Once the model has been built: We are going to package our learned model as a UIMA AnalysisEngine (annotator) which will require a prerequisite, noun-phrase chunking (which we will add), which requires sentence breaking (which we have already). Our AnalysisEngine will take noun phrases encountered in text (iterate over noun phrase annotations already in the CAS) and check to see if they contain an eventive nominalization. It will apply the same structural analysis to the noun phrase, then use the learned Nomlex mapping to put the nominal's arguments into their correct slots in the corresponding verb frame found in VerbNet. We will add this information as annotations on the CAS and send it along to the next AnalysisEngine in the UIMA pipeline. Here are more examples of the kind of information the system will be able to extract: [The appointment of Professor Smith as the chair of the Biology Department] was widely publicised in the local newspaper. EVENT: appointment ARG0: Professor Smith ARG1: the chair of the Biology Department John told Mary about [the publication of Professor Smith's latest article.] EVENT: publication ARG0: (unspecified; perhaps a conference or a journal publisher) ARG1: Professor Smith's latest article [Professor Smith's organization of the International Biological Sciences Conference] was appreciated by other members of the field. EVENT: organization ARG0: Professor Smith ARG1: the International Biological Sciences Conference Issues yet to be solved: Issues with chunking accuracy Is a full parse going to be necessary? Handling prepositional phrase attachment Generalizing from nominals present in Nomlex to other nominals that have the same type of frame How to compile Nomlex and VerbNet down into a unified resource we can use efficiently References: Collin F. Baker, Charles J. Fillmore, and John B. Lowe. The berkeley framenet project. In Proceedings of COLING/ACL '98, pages 86­90, 1998. Daniel Gildea and Daniel Jurafsky. Automatic labeling of semantic roles. Computational Linguistics, 28(3), 2002. Paul Kingsbury, Martha Palmer, and Mitch Marcus. Adding semantic annotation to the penn treebank. Karin Kippm, Hoa Trang Dang, and Martha Palmer. Class-based construction of a verb lexicon. Catherine Macleod, Ralph Grishman, Adam Meyers, Leslie Barrett, and Ruth Reeves. Nomlex: A lexicon of nominalizations. Sameer Pradhan, Kadri Hacioglu, Valeri Krugler, Wayne Ward, James H. Martin, and Daniel Jurafsky. Support vector learning for semantic argument classification. Machine Learning, 60(1):11­39, 2005. Sameer Pradhan, Honglin Sun, Wayne Ward, James H. Martin, and Daniel Jurafsky. Parsing arguments of nominalizations in english and chinese. Mihai Surdeanu, Sanda M. Harabagiu, John Williams, and John Aarseth. Using predicate argument structures for information extraction. Robert S. Swier and Suzanne Stevenson. Exploiting a verb lexicon in automatic semantic role labelling. File: 10-709-nominalizations.txt