What is the best way to exploit linguistic information in statistical text processing models? Much recent work in NLP has focused on linguistic feature engineering.
In this thesis, we propose to use structured sparsity to answer this question.Structured sparsity provides a way to define structures in a parameter space by means of regularization. In statistical text analysis, we propose to encode linguistic information by defining linguistically-motivated structures in the parameter space. Defining structures based on linguistic cues introduces a new technical challenge, since a typical corpus may contain billions of words, millions of sentences, and thousands of semantic topics. Each of these can be considered as a structure in the model.
The goal of this thesis is to develop efficient learning algorithms for structured sparse models with massive numbers of (overlapping) groups and to show how structured sparsity can be used to exploit linguistic information in statistical text processing models.
First, we propose an efficient method for solving penalized optimization problems for thousands to millions of overlapping group lasso penalties based on the alternating directions method of multipliers (ADMM; Hestenes, 1969; Powell, 1969). We then show linguistically motivated structured regularizers for text categorization and learning word representations that can be solved efficiently using the above method.
Future work includes learning overcomplete word representations and incorporating structures of the output space.
Noah Smith (Chair)
Francis Bach (INRIA)