Internet Search Technologies, 15-505, Fall 2007




 

Syllabus and Course Schedule

 

This seminar presents a practicum of how research from across computer science is used to provide internet search and related services. We look at selected works from the areas of systems, machine learning, language technologies and human computer interaction. The seminar class will meet weekly for 90 minutes, and each class consists of a lecture and interactive discussion.

This schedule will change and get more fleshed out as the semester progresses.

 

 

Module

Lectures, readings, online materials 

Homework


Large-scale computation and storage

 Lecture 1: 8/28/07  (Larsen) 

 

Introduction to parallel computation

 

 Lecture 2: 9/4/07  (Monson)

 

Parallel computation through MapReduce

 

Reading:

·         MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean, Sanjay Ghemawat, OSDI'04: Sixth Symposium on Operating System Design and Implementation, 2004

Reading response due at start of class.

 

HW 1 out.

 Lecture 3: 9/11/06  (Monson)

 

A file system optimized for streaming reads and appending writes.

 

Reading:

·         The Google File System, Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, Proceedings of the 19th ACM Symposium on Operating Systems Principles, 2003.

Reading response due at start of class.

 

HW 1 due.

 

HW 2 out.

 Lecture 4: 9/18/06  (Monson)

A distributed storage system for structured non-relational data

 

Reading:

·         Bigtable: A Distributed Storage System for Structured Data, Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber, 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2006

Reading response due at start of class.

 

 

Information retrieval

 Lecture 5: 9/25/06  (Nigam)

 

Introduction to information retrieval

 

HW 2 due.

 

HW 3 out.

Lecture 6: 10/2/06  (Larsen)

 

Improved ranking using link structure: PageRank and Hubs & Authorities

 

Reading:

 

·         Sergey Brin, Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine, Computer Networks, 1998.

·         (Optional) Jon Kleinberg. Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Extended version in Journal of the ACM 46(1999). Also appears as IBM Research Report RJ 10076, May 1997.

 

Reading response due at start of class.

 

 


Machine learning

 Lecture 7: 10/9/07

 

Supervised Classification and Logistic Regression

 

N/A

 Lecture 8: 10/16/07 (Nigam)

 

Hierarchical agglomerative clustering, k-means clustering, canopies

 

Reading:

 

·         (Optional) Andrew McCallum, Kamal Nigam and Lyle Ungar. Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching. In Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000.

 

 

HW 3 due.

 

HW 4 out.

 Lecture 9: 10/23/07 (Larsen)

 

More supervised classification

 

N/A


Language Analysis

 Lecture 10: 10/30/07 (Fyshe)

 

Machine translation

 

N/A

 Lecture 11: 11/6/07 (Nigam)

 

Information extraction

 

HW 4 due (11/8)


User interface design

 Lecture 12: 11/13/07

Introduction to good user interface design practice

TBD

 

Lecture 13: 11/20/07

 

Extended case study of user interface design

 

TBD

TBD

 Lecture 14: 11/27/07

TBD


Corporate Culture

 Lecture 15: 12/4/07 (Nigam)

Organizing companies around the practice of computer science

TBD