Kenji Sagae

Department of Computer Science
University of Tokyo
 
Phone: +81 3 5803 1697
Fax: +81 3 5802 8872

sagae+web@cs.cmu.edu
   


Go to my publications (updated April 2008).

Get my dependency parser for child language transcripts (CHILDES parser).

The child language transcripts annotated with grammatical relations are available directly from the CHILDES database (unzip the archive and you will find the annotated files in the Eve directory).


October 2006: I moved to the University of Tokyo. The contact information above is current.

Novemeber 2005: I successfully defended my thesis, A multi-strategy approach to parsing of grammatical relations in child language transcripts.

My thesis research focused on syntactic analysis of CHILDES data, but the main parsing issues are applicable to the general problem of parsing natural language. See my thesis summary and defense slides.

Thesis Advisors (while at CMU)

Additional thesis committee members


Research

My primary reserch interest is natural language processing, and much of my recent work has been on data-driven and linguistically-motivated models for syntactic parsing. Topics in my current work include: interfacing shallow and deep syntactic analysis, parser ensembles, discriminative disambiguation models, parsing efficiency, and descriptive adequacy of syntactic formalisms. I have applied this research in topics ranging from child language development to bioinformatics. See my list of publications.

My research at CMU involved the identification of grammatical relations, or GRs, (such as subjects, objects and adjuncts) in corpora of transcribed dialogs between children and parents. Most of these transcripts came from the CHILDES Database, but I also worked with transcripts from other sources.

A summary of my GR parsing approach for CHILDES appeared in

Sagae, K. Davis, E., Lavie, A., MacWhinney, B. and Wintner, S. 2007. High-accuracy annotation and parsing of CHILDES transcripts. Proceedings of the ACL-2007 Workshop on Cognitive Aspects of Computational Language Acquisition. Prague, Czech Republic.

For an example of how syntactic analysis of child language can be used, look at

Sagae, K., Lavie, A., and MacWhinney, B. (2005) Automatic measurement of syntactic development in child langugage. In proceedings of the 42nd Meeting of the Association for Computational Linguistics. Ann Arbor, Michigan.

I have also worked on applying discriminative dependency parsing approaches (such as the one I developed for my thesis work) to syntactic analysis based on more linguistically sophisticated models (such as HPSG). For an introduction to this research, see

Sagae, K., Miyao, Y. and Tsujii, J. 2007. HPSG Parsing with shallow dependency constraints. Proceedings of the 44th Meeting of the Association for Computational Linguistics (ACL'07). Prague, Czech Republic.

A different (but related) aspect of my dissertation is the combination of several parsers to improve parsing accuracy. My graph-based ensemble approach for dependency parsing was shown to be very effective in the 2007 CoNLL shared task on multilingual dependency parsing. My parser combination work was first published as

Sagae, K. and Lavie, A. 2006. Parser combination by reparsing. Proceedings of the 2006 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics - short papers (HLT-NAACL'06). New York, NY.

Other topics I have worked on include parser evaluation, conversion among syntactic representation formalisms, machine translation evaluation, and identification of protein-protein interactions from text. See my list of publications.

Also, now I look much older than I do in the picture in this page.