The princple aim of my research is to facilitate broad-coverage, deep natural language understanding by computers. As most AI researchers know, this is a tall order that requires vast amounts of carefully encoded or learned knowledge. My advisor Tom Mitchell and I believe that the best path toward this distant goal is to combine small amounts of manually-encoded knowledge with the immensity of text available on the Web using efficient, scalable and robust machine learning algorithms.   


Over the past few years, the primary goal for our Read the Web research project has been to develop a highly accurate and continuous system for extracting knowledge from the web. Building upon that research, my focus now is figuring out how to use the knowledge extracted by our system to address traditional natural language understanding tasks such as information extraction, co-reference resolution, and semantic role labeling, but in a non-traditional (i.e. not a supervised machine learning) way. Eventually, we also hope to use our web-extracted knowledge to help improve even more mature technologies like syntactic parsing and part-of-speech tagging.


Previously, I worked with Teruko Mitamura and Eric Nyberg on classifying questions in terms of their expected answer type for the JAVELIN II question answering project.  Before that, we worked on extending the Analyzer component of the KANTOO machine translation system to make use of lexical information in VerbNet for the purposes of information extraction. Part of this work was carried out under the auspices of the HALO project.


