Students who want to do an independent study or IR Lab with me can either i) propose their own topic, or ii) choose a topic from the list below. Typically a student must have completed Search Engines and Web Mining (11-441/11-641), Search Engines (11-442/11-642), Information Retrieval (11-741) before doing an IR independent study or lab.
Adaptive Filtering of Microblog Text: Create a system that performs adaptive filtering of a Twitter microblog stream. The system will incrementally learn filtering profiles ("queries", "classifiers") and dissemination thresholds. There are different ways to learn dissemination thresholds, but I am most interested in approaches based on score modeling and sampling. The system could be built on top of the Indri or Lucene open-source search engines, which requires learning the API of the search engine, but avoids implementing document a document parser. Evaluation can be done with TREC 2011-2013 Microblog Track data.
LTI Site Search: The LTI has a site search capability, using the Indri search engine. It's a good start, but it could be improved in many ways. If you are interested in working on improved ranking algorithms, query processing, improved crawling algorithms, search user interfaces, search log analysis, or other aspects of local Web search, this might be an interesting project for you.
Web Page Structure Annotation: Modern web pages are complex. They contain advertising, navigation links, links to related content, the main content, and other material, all mixed together. Before the page can be indexed for search or used for text mining, the web page must be annotated with additional markup that identifies each type of material so that each type can be handled appropriately. I am interested in supervised and unsupervised techniques that can provide reliable annotation of large datasets (e.g., at least a hundred million web pages).
Sentiment Analysis: Track the mentions of movies or other products in social media. Use sentiment analysis to determine how people feel about the product. Examine how sentiment analysis correlates with external evidence, such as sales figures or opinion polls.