Justin Betteridge

PhD Student

Language Technologies Institute
Carnegie Mellon University


6605 Gates-Hillman Center

5000 Forbes Ave.
Pittsburgh, PA 15213




Research Interests/Experience


The princple aim of my research is to facilitate broad-coverage, deep natural language understanding by computers. As most AI researchers know, this is a tall order that requires vast amounts of carefully encoded or learned knowledge. My advisor Tom Mitchell and I believe that the best path toward this distant goal is to combine small amounts of manually-encoded knowledge with the immensity of text available on the Web using efficient, scalable and robust machine learning algorithms.   


Over the past few years, the primary goal for our Read the Web research project has been to develop a highly accurate and continuous system for extracting knowledge from the web. Building upon that research, my focus now is figuring out how to use the knowledge extracted by our system to address traditional natural language understanding tasks such as information extraction, co-reference resolution, and semantic role labeling, but in a non-traditional (i.e. not a supervised machine learning) way. Eventually, we also hope to use our web-extracted knowledge to help improve even more mature technologies like syntactic parsing and part-of-speech tagging.


Previously, I worked with Teruko Mitamura and Eric Nyberg on classifying questions in terms of their expected answer type for the JAVELIN II question answering project.  Before that, we worked on extending the Analyzer component of the KANTOO machine translation system to make use of lexical information in VerbNet for the purposes of information extraction. Part of this work was carried out under the auspices of the HALO project.


As an undergraduate, I worked with Irene Langkilde-Geary (who started the BYU NLP lab) on a dependency transformation of the Penn Treebank.





Assuming Facts Are Expressed More Than Once. [pdf ]
J. Betteridge, A. Ritter and T. Mitchell In Proceedings of the 27th International Florida Artificial Intelligence Research Society Conference (FLAIRS-27), 2014.

Toward an Architecture for Never-Ending Language Learning. [pdf ]
Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr. and Tom M. Mitchell. Proceedings of the Conference on Artifial Intelligence (AAAI). 2010.


Coupled Semi-Supervised Learning for Information Extraction . [pdf ]
Andrew Carlson, Justin Betteridge, Richard C. Wang, Estevam R. Hruschka Jr. and Tom M. Mitchell. Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM). 2010.


Populating the Semantic Web by Macro-Reading Internet Text. [pdf ]
Tom M. Mitchell, Justin Betteridge, Andrew Carlson, Estevam Hruschka, and Richard Wang. Invited paper, Proceedings of the 8th International Semantic Web Conference (ISWC 2009). 2009.


Coupling Semi-Supervised Learning of Categories and Relations. [pdf ]
Andrew Carlson, Justin Betteridge, Estevam R. Hruschka Jr. and Tom M. Mitchell. Proceedings of the NAACL HLT 2009 Workshop on Semi-supervised Learning for Natural Language Processing. 2009.


Toward Never Ending Language Learning. [pdf ]
Justin Betteridge, Andrew Carlson, Sue Ann Hong, Estevam R. Hruschka Jr., Edith L. M. Law, Tom M. Mitchell, Sophie H. Wang. AAAI Spring Symposium on Learning by Reading and Learning to Read. 2009.


Semantic Extensions of the Ephyra QA System for TREC 2007. [pdf ]
Nico Schlaefer, Jeongwoo Ko, Justin Betteridge, Guido Sautter, Manas Pathak, Eric Nyberg. Proceedings of the Sixteenth Text REtrieval Conference (TREC), 2007.


JAVELIN III: Cross-Lingual Question Answering from Japanese and Chinese Documents. [pdf ]
Teruko Mitamura, Frank Lin, Hideki Shima, Mengqiu Wang, Jeongwoo Ko, Justin Betteridge, Matthew Bilotti, Andrew Schlaikjer and Eric Nyberg. Proceedings of NTICIR-6 Workshop, Tokyo, Japan. 2007.


A Factored Functional Dependency Transformation of the English Penn Treebank for Probabilistic Surface Generation. [ pdf ]
Irene Langkilde-Geary and Justin Betteridge. Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC). 2006.


Poster abstract: Capturing knowledge from domain text with controlled language.
Eric Nyberg, Teruko Mitamura, Justin Betteridge, Simon Fung, and David Svoboda. Proceedings of the Third International Conference on Knowledge Capture (KCAP). 2005.


Resume/CV  [ pdf ]