Natural language parsing has typically been done with small sets of discrete categories such as NP and VP, but this representation does not capture the full syntactic nor semantic richness of linguistic phrases. Attempts to improve on this by lexicalizing phrases or splitting categories partly address the problem but at the cost of huge feature spaces and sparseness. Here, we explore a Compositional Vector Grammar (CVG), which combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations. The CVG improves the PCFG of the Stanford Parser by 3.8% to obtain an F1 score of 90.4%. It is fast to train and can be used as an efficient reranker. The CVG learns a soft notion of head words and improves performance on the types of ambiguities that require semantic information such as PP attachments. I discuss some of the properties of such models and different possible model architectures in this space.
Christopher Manning is a Professor of Computer Science and Linguistics at Stanford University. His Ph.D. is from Stanford in 1995, and he held faculty positions at Carnegie Mellon University and the University of Sydney before returning to Stanford. He is a fellow of AAAI and the Association for Computational Linguistics. Manning has coauthored leading textbooks on statistical approaches to natural language processing (Manning and Schuetze 1999) and information retrieval (Manning, Raghavan, and Schuetze, 2008). His recent work has concentrated on probabilistic approaches to NLP problems and computational semantics, particularly including such topics as statistical parsing, robust textual inference, machine translation, large-scale joint inference for NLP, computational pragmatics, and hierarchical deep learning for NLP.
Faculty Host: Nathaniel Schneider