Research
Broadly, I am interested in how we can use linguistics, cognition, and statistics to improve computational models of human language. Currently, I work with Alon Lavie on the AVENUE project, which is leveraging linguistically-motivated syntax to improve machine translation aimed at both resource rich and resource poor languages. My research is focused on developing discriminant syntactic features that help the system choose better translations. This includes both phrase structure and dependency structure and how to best statistically model these structures so that we can capture the behavior of the language pair being translated.
Previously, I worked with Lori Levin and Robert Frederking on a year-long pilot project (also a part of AVENUE) investigating active learning techniques for presenting the a bilingual person with the examples from a linguistically-structured corpus so that such people can be tapped as an efficient and cost-effective resource for improving the quality of machine translation for languages that have few alternatives for acquiring the data needed to traing modern machine translation systems.
Contact
Jonathan Clark
CMU Language Technologies Institute
5000 Forbes Avenue
Gates Hillman Complex 5407
Pittsburgh, PA 15213 |
Office: Gates Hillman Complex 5707
Phone: (412) 254-4566
|
Publications
G. Hanneman, V. Ambati, J. Clark, A. Parlikar, A. Lavie, "An Improved Statistical Transfer System for French–English Machine Translation", The Fourth Workshop on Statistical Machine Translation (WMT09) at the European Association for Computational Linguistics (EACL), March 2009. Athens, Greece.
J. Clark , R. Frederking, L. Levin "Inductive Detection of Language Features via Clustering Minimal Pairs: Toward Feature-Rich Grammars in Machine Translation", The Second Workshop on Syntax and Structure in Translation (SSST) at the Associatation for Computational Linguistics (ACL), June 2008. Columbus, Ohio. [PDF] [Slides]
J. Clark , R. Frederking, L. Levin "Toward Active Learning in Corpus Creation: Automatic Discovery of Language Features During Elicitation", The Sixth Language Resources and Evaluation Conference (LREC), May 2008. Marrakech, Morocco. [PDF] [Slides]
J. Clark , C. Hannon, "A Classifier System for Author Recognition Using Synonym-Based Features", Sixth Mexican International Conference on Artificial Intelligence , November 2007. Aguascalientes, Mexico. [PDF]
J. Clark , C. Hannon, "An Algorithm for Identifying Authors Using Synonyms", ENC 2007 , September 2007. Morelia, Mexico.
M. Bowden, M. Olteanu, P. Suriyentrakorn, J. Clark, D. Moldovan, "LCC's PowerAnswer at QA@CLEF 2006," CLEF 2006 Working Notes, September, 2006. Alicante, Spain. [PDF]
C. Hannon, J.Clark, "A Cognitive-Based Approach to Learning Integrated Language Components", The Third International Workshop on Natural Language Understanding and Cognitive Science, May 2006. Paphos, Cyprus
Reports
J.Clark, "Treegraft: A Stochastic Transduction Chart Parser", NLP Lab Self-Defined Project Final Report, Spring 2008. [PDF] [Google Code Project page]
J. Clark, J. Gonzalez, "Coreference: Current Trends and Future Directions", Language and Statistics II Literature Review, Fall 2008.[PDF]
The Initial
With apologies to Noah A. Smith, I also feel the need to explain the pretentious middle initial on all my publications: The name Jon Clark is only slighly less common than John Smith. Other Jonathan Clarks include the 2007 CMU MBA class co-president, the songwriter, the photographer, the woodworker, the journalist, the comedian, the cameraman, the actor, the teacher, the pilot, the athlete, the golfer, the biker, the boxing champion, the lighting designer, the British artist, the sculptor, the architect, the health technologist, the computational biology professor, the personal trainer, the wellness professional, the history professor, the chief counsel for Morgan Stanley, the finance professor, the attorney, the founder of Thinstall (virtualization software), the senior VP at Sallie Mae, the real estate agent, the university president, the music professor, the post-hardcore band singer, the founder of Business Writing Solutions, the 18th century general, the basketball player, the NLP trainer (NeuroLinguistic Programming), the telecommunications consultant, the IT professional, the search marketing specialist, the computer engineering student, the polymer research engineer, the physician, another physician, another still, the surgeon, the zoology professor, the biomedial robotics professor (who, incidentally, published a paper with Jorge Cham), and the former CTO of LionBridge (large language engineering company that made this translation software... talk about hard to be unique).
Even the initial doesn't always work; the other Jonathan H Clark is a Texas lawyer.
Courses
Spring 2009
Fall 2008
Spring 2008
Fall 2007
Personal
When I'm not knee-deep in code, I enjoy going to Pittsburgh Pirates baseball games with my fiancé Libby (while eating nachos topped with obscene amounts of jalapeños), playing drums (jazz, hand percussion, metal, it's all good stuff), and learning bits of random languages.
And of course, reading Jorge Cham's wonderful PhD comics (follow the link for more laughs):


Links
Simple, but Brilliant Java Proramming Advice
Choosing a Ph.D. Program in Computer Science (Berkley)
Advice on Applying (and whether to apply) for a Ph.D. in Computer Science (CMU)
Advice on Applying for Ph.D., Fellowships, and Other Such (Stanford)
Advice for Writing Personal Statements
A Few Favorite Applications
Remember the Milk - Advanced Todo List
Pros: Implements Getting Things Done and most of Randy Pausch's Time Management lecture
Cons: Doesn't integrate time tracking
Google Calendar - Tells me when to be places
Pros: Easy to use interface and support for sharing calendars
FindBugs - Finds bugs in Java programs
|