Language Technologies Institute
School of Computer Science
Carnegie Mellon University
Spoken Language Systems Lab, INESC-ID
Instituto Superior Tecnico
Greetings! I am a Phd student in the Dual Degree Carnegie Mellon Portugal PhD Program, between Carnegie Mellon University and Instituto Superior Tecnico. Currently, I am working in the Language Technologies Institute in Carnegie Mellon University.
It is my privilage to be working with my advisors Alan Black (LTI), Chris Dyer (LTI) and Isabel Trancoso (INESC-ID), to whom I hold my deepest respect.
Broadly, I am interested in applying statistics and machine learning methods in Natural Language Processing tasks. Currently, my work addresses the problem of Machine Translation in Microblogs, such as Twitter and Facebook. I am interested in (1) methods to crawl large amounts of in-domain parallel data from Microblogs, (2) models that better generalize the translation process in this domain and (3) evaluation metrics that are more suited to evaluate the translation quality in Microblogs.
The parallel corpora I crawled from Microblogs will be available here.
If you are interested research on Microblogs, feel free to contact me. Also, if you are interested in crawling Sina Weibo, here's a guide I wrote showing how to set up an application.