Roni Rosenfeld

Research in Human Language Technology

The Computer Age will truly arrive when computers learn to communicate with us humans on our own terms. For this to happen, we must pursue the four SILKy technologies: Speech, Image, Language, Knowledge.

Human Language Technology (HLT) is the essence of the 'L' in SILK and is crucial for 'S' and 'K'. In my research in HLT, my tools are information theory and statistics. My raw materials are huge amounts of text of various types. My end products are new modeling techniques, improved performance of real systems, and new insights into the statistical nature of human language.

Statistical Language Modeling is useful, often crucial, to all human language technologies. These include speech recognition, machine translation, document classification and routing, information retrieval, textual datamining, optical character recognition, handwriting recognition, spelling correction, and many others. In all these cases, language models guide the system by acting as a knowledge source and imposing soft constraints on the system's expectations.

Modeling of human language is at the intersection of statistics and traditional machine learning. Technically speaking, it can be viewed as a statistical estimation problem, albeit in a very sparse domain. But its subject matter, human language, has been the subject of intense research in Artificial Intelligence in the past few decades, research which has relied on computational linguistics and machine learning techniques.  Consequently, our research group consists of computer scientists and statisticians.  We develop statistical frameworks for modeling various aspects of human language, implement them, and try out their effect on various language technologies applications.  Some of the problems we have been tackling recently are: