Students who want to do an independent study or IR Lab with me can either i) propose their own topic, or ii) choose a topic from the list below.
LTI Site Search: We recently deployed an LTI Site Search capability, using Lemur. It's a good start, but it could be improved in many ways. If you are interested in working on improved ranking algorithms, query processing, improved crawling algorithms, search user interfaces, search log analysis, or other aspects of local Web search, this might be an interesting project for you.
Build your Own Search Service (BOSS): Yahoo! recently deployed a new service called BOSS that allows you to build search interfaces on top of Yahoo's search engine. I would be interested in projects that use this interface to deliver more accurate search, personalized search, or better organization and display of search results.
Web Page Structure Annotation: Modern web pages are complex. They contain advertising, navigation links, links to related content, the main content, and other material, all mixed together. Before the page can be indexed for search or used for text mining, the web page must be annotated with additional markup that identifies each type of material so that each type can be handled appropriately. I am interested in supervised and unsupervised techniques that can provide reliable annotation of large datasets (e.g., at least a hundred million web pages).