Recently, I've been working on partial and robust unrestricted natural language parsing with MT applications. I've been looking at robust, deep analysis of English captions (subtitles) in the financial news domain. One potential application of the research is in on-line automatic caption translation. This task requires high parsing efficiency both in terms of parse correctness and time. Another issue involved is high level of ambiguity that many parsing approaches lead to when applied to syntactically unrestricted text.
The approach is based on multi-level chart parsing with pruning to achieve efficiency and ambiguity reduction. The parser uses a broad coverage English grammar and lexicon with extensive subcategorization information and attachment preferences. The research is done in the context of the KANT system, but it doesn't assume any control of the input language. The coverage of the grammar is well illustrated by the on-line data to Czuba et al (1998). In a recent English-Chinese on-line caption translation feasibility study, the approach showed promising results. Currently I'm working on adding statistical information to the pruning algorithm. More can be found in Czuba (2000)
I'm also interested in Translingual (Cross-Lingual) Information Retrieval, especially in corpus-based methods. See Czuba, Liu (1999) for results of experiments in using SYSTRAN to derive a bilingual aligned corpus for statistical TLIR. Currently, I'm investigating ways of harvesting bilingual corpora for TLIR from the Web.
My other interests include learning for NLP and neural approaches to NLP. I'm also interested in theoretical and formal linguistics and in what's happening in the world of LFG and HPSG.