LTI logl Freebase logo

Reliable and Generalizable
Neural Search Engine Architectures

Jamie Callan
Language Technologies Institute
School of Computer Science
Carnegie Mellon University

 

Project Overview

During the last five years, search engines based on neural network techniques have emerged as an alternative to traditional search engine architectures. These new neural ranking architectures use distributed text representations that enable reasoning about how well a query term such as airplane matches a document term such as jet, and more sophisticated methods of combining evidence. They may be more effective than simpler models, especially when a massive amount of training data is available.

The proposed research develops new methods of training neural ranking architectures when a massive amount of training data is not available for the target application; integrates external knowledge resources to provide more information for making accurate ranking decisions; and applies the architecture to the task of retrieving tabular data from scientific documents. This collection of problems is chosen to increase the practicality of neural ranking architectures outside of high-traffic commercial search environments, and to investigate and exploit the strengths of neural ranking architectures at using attention mechanisms to manage evidence, soft-matching across different types of evidence, and learning sophisticated nonlinear decision models. This research furthers the development of neural ranking architectures that are generally applicable and more reliable than current systems due to their ability to integrate a broader range of evidence in a predictable manner.

 

Project Personnel

Jamie Callan, Principal Investigator Jamie's picture
Zhuyun Dai, Research Assistant Zhuyun's picture
Hafeezul Rahman, Research Assistant Hafeezul's picture
HongChien Yu, Undergraduate Research Assistant HongChien's picture
Weihan Anita Li, Undergraduate Research Assistant Anita's picture

 

Dissemination of Research Results

Research results are disseminated by research publications. Datasets and experimental results are disseminated through online virtual appendices to research publications. Open-source software is disseminated as part of the open-source Lemur Project.

 


NSF logo     This research is sponsored by National Science Foundation grant IIS-1815528. Any opinions, findings, conclusions or recommendations expressed on this Web site are those of the author(s), and do not necessarily reflect those of the sponsors.

Updated on Jul 6, 2020.
Jamie Callan