Introduction to Statistical Machine Translation

Statistical machine translation can be reduced to ordering food at your local Chinese restaurant. Each item on the menu has a Chinese and English translation, and within a few minutes you can probably determine the Chinese characters for "beef" or "lo mein", by cross referencing several dishes with these ingredients. So its reasonably easy to build a dictionary of Chinese characters and their corresponding translations, and possibly even for short phrases like "in garlic sauce".

Tommorow, they put up a special dish on the chalk board, in Chinese characters. You could use your bilinugual dictionary to figure out exactly what the special is (as long as you have seen all the ingredients/styles before). Now extend this technique to European Parliament proceedings for French and English, and try translating speeches given by the French Prime Minister. Putting together the phrases and the words to generate a translation gets a little harder. Thats the part I work on, and the listed publications are evidence of my struggle.

My research focus is in syntax augmented machine translation. I develop methods that allow statistical machine translation systems to learn more expressive translation operations that capture complex re-ordering effects across languages and generalize well to new data. Under the syntax augmented mt framework, I am particulary interested in the role of labels that constrain and generalize automatically induced grammar rules. My thesis proposal on this topic is available on request.