Broadly, I am interested in how we can use linguistics and statistics to improve computational models of human language. Currently, I work with Alon Lavie on statistical machine translation. I also frequently collaborate with Chris Dyer. My dissertation, which I'm currently working on, is entitled "Locally Non-Linear Learning via Feature Induction in Statistical Machine Translation".

I also spent some time building very large language models. Before that, I developed discriminant syntactic features that help the system choose better translations in both resource rich and resource poor languages; these features included phrase structure and dependency structure and how to best statistically model these structures to capture the behavior of the language pair being translated.

Previously, I worked with Lori Levin and Robert Frederking on a year-long pilot project (also a part of AVENUE) investigating active learning techniques for presenting the a bilingual person with the examples from a linguistically-structured corpus so that such people can be tapped as an efficient and cost-effective resource for improving the quality of machine translation for languages that have few alternatives for acquiring the data needed to traing modern machine translation systems.


ducttape: HyperWorkflow Manager

MultEval: Easy Bootstrap Resampling and Approximate Randomization for BLEU, METEOR, and TER using Multiple Optimizer Runs


J. Clark, A. Lavie, C. Dyer "One System, Many Domains: Open-Domain Statistical Machine Translation via Feature Augmentation", Association for Machine Translation in the Americas (AMTA) October 2012. San Diego, California, USA [PDF]

Thesis Proposal: "Locally Non-Linear Learning via Feature Induction in Statistical Machine Translation", April 2012.

J. Clark, C. Dyer, A. Lavie, N. Smith "Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability", Association for Computational Lingustics (ACL) July 2011. Portland, Oregon, USA [PDF] [ACL Slides] [Software] [YouTube Presentation]

C. Dyer, J. Clark, A. Lavie, N. Smith "Unsupervised Word Alignment with Arbitrary Features", Association for Computational Lingustics (ACL) July 2011. Portland, Oregon, USA [PDF]

C. Dyer, K. Gimpel, J. Clark, N. Smith "The CMU-ARK German-English Translation System", Workshop on Statistical Machine Translation (WMT11) July 2011. Edinburgh, UK [PDF]

G. Hanneman, J. Clark, A. Lavie, "Improved Features and Grammar Selection for Syntax-Based MT", Workshop on Statistical Machine Translation (WMT10) at the Association for Computational Lingustics (ACL) July 2010. Uppsala, Sweden [PDF]

J. Clark, J. Weese, B. Ahn, A. Zollmann, Q. Gao, K. Heafield, A. Lavie, "The Machine Translation Toolpack for LoonyBin: Automated Management of Experimental Machine Translation HyperWorkflows", Prague Bulletin of Mathematical Linguistics (Presented at the Fourth Machine Translation Marathon) January 2010. Dublin, Ireland [PDF] [MT Lunch Slides] [MT Marathon Slides] [Software]

J. Clark, A. Lavie, "LoonyBin: Keeping Language Technologists Sane through Automated Management of Experimental (Hyper)Workflows", LREC 2010. Malta. [PDF] [Software]

G. Hanneman, V. Ambati, J. Clark, A. Parlikar, A. Lavie, "An Improved Statistical Transfer System for French–English Machine Translation", The Fourth Workshop on Statistical Machine Translation (WMT09) at the European Association for Computational Linguistics (EACL), March 2009. Athens, Greece. [PDF]

J. Clark , R. Frederking, L. Levin "Inductive Detection of Language Features via Clustering Minimal Pairs: Toward Feature-Rich Grammars in Machine Translation", The Second Workshop on Syntax and Structure in Translation (SSST) at the Associatation for Computational Linguistics (ACL), June 2008. Columbus, Ohio. [PDF] [Slides]

J. Clark , R. Frederking, L. Levin "Toward Active Learning in Corpus Creation: Automatic Discovery of Language Features During Elicitation", The Sixth Language Resources and Evaluation Conference (LREC), May 2008. Marrakech, Morocco. [PDF] [Slides]

J. Clark , C. Hannon, "A Classifier System for Author Recognition Using Synonym-Based Features", Sixth Mexican International Conference on Artificial Intelligence , November 2007. Aguascalientes, Mexico. [PDF]

J. Clark , C. Hannon, "An Algorithm for Identifying Authors Using Synonyms", ENC 2007 , September 2007. Morelia, Mexico.

M. Bowden, M. Olteanu, P. Suriyentrakorn, J. Clark, D. Moldovan, "LCC's PowerAnswer at QA@CLEF 2006," CLEF 2006 Working Notes, September, 2006. Alicante, Spain. [PDF]

C. Hannon, J.Clark, "A Cognitive-Based Approach to Learning Integrated Language Components", The Third International Workshop on Natural Language Understanding and Cognitive Science, May 2006. Paphos, Cyprus


J.Clark, "Treegraft: A Stochastic Transduction Chart Parser", NLP Lab Self-Defined Project Final Report, Spring 2008. [PDF] [Google Code Project page]

J. Clark, J. Gonzalez, "Coreference: Current Trends and Future Directions", Language and Statistics II Literature Review, Fall 2008.[PDF]

When I'm not knee-deep in code, I enjoy going to Pittsburgh Pirates baseball games with my wife Libby (while eating nachos topped with obscene amounts of jalapeños), playing drums (jazz, hand percussion, metal, it's all good stuff), and learning bits of random languages. And of course, reading Jorge Cham's wonderful PhD comics (follow the link for more laughs):


