In the ocean of Web data, Web search engines are the primary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters.   Web data, however, is always evolving. The number of active Web sites continues to grow (180 millions at the beginning of 2014) and there are currently more than hundreds of billion indexed pages. On the other hand, Internet users are above two billion and hundreds of million of queries are issued each day. In the near future, centralized systems are likely to become less effective against such a data-query load, thus suggesting the need of fully distributed search engines.  Such engines need to maintain high quality answers, fast response time, high query throughput, high availability and scalability; in spite of network latency and scattered data. In this talk we present the main challenges behind the design of a distributed Web retrieval system and our research in all the components of such web search engine: crawling, indexing, and query processing.


Ricardo Baeza-Yates is VP of Yahoo! Labs for Europe and Latin America, leading the labs at Barcelona, Spain and Santiago, Chile, since 2006. Between 2008 and 2012 he also oversaw the Haifa lab.  He is also part time Professor at the Dept. of Information and Communication Technologies of the Universitat Pompeu Fabra in Barcelona, Spain. During 2005 he was an ICREA research professor at the same university. Until 2004 he was Professor and Director of the Center for Web Research at the Dept. of Computing Science of the University of Chile (in leave of absence until today).

He obtained a Ph.D. from the University of Waterloo, Canada, in 1989. Before, he obtained two masters (M.Sc. CS & M. Eng. EE) and the electrical engineering degree from the University of Chile in Santiago. He is co-author of the best-seller Modern Information Retrieval textbook, published in 1999 by Addison-Wesley with a second enlarged edition in 2011, that won the ASIST 2012 Book of the Year award. He is also co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 500 other publications. From 2002 to 2004 he was elected to the board of governors of the IEEE Computer Society and in 2012 he was elected for the ACM Council. He has received the Organization of American States award for young researchers in exact sciences (1993), the Graham Medal for innovation in computing given by the University of Waterloo to distinguished ex-alumni (2007), the CLEI Latin American distinction for contributions to CS in the region (2009), and the National Award of the Chilean Association of Engineers (2010), among other distinctions.  In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences and since 2010 is a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow.

Faculty Host: Jamie Callan

Faculty, staff, students and alumni gathered May 31 to pay tribute to Mark Stehlik, outgoing assistant dean for undergraduate education for the School of Computer Science.

A party in Stehlik's honor was held in the Collaborative Commons of the Gates and Hillman Centers. About 200 people attended.

During impromptu remarks May 31, Stehlik thanked his colleagues and his students:
Stehlik, who has taught computer science at the Pittsburgh campus since 1982 and served as assistant dean since 1988, is stepping down this summer to begin a five-year appointment as associate dean for education at Carnegie Mellon Qatar.

Tom Cortina, associate teaching professor, will assume the duties of SCS assistant dean and is working with Stehlik on the transition.

Stehlik was this year's recipient of the Doherty Award for Sustained Contributions to Excellence in Education, and is a past winner of the Herbert A. Simon Award for Teaching Excellence in Computer Science.

On May 31, David Kosbie, assistant teaching professor of computer science and this year's recipient of the Simon Award, talked about how his teaching methods were patterned after those of Stehlik:

RI graduate student, Marek Michalowski, is featured in Post-Gazette for his work with robots and autistic children. Michael's favorite robot, Keepon, is "being used to study how children interact socially, and whether the robot might particularly be able to help children with autism". Read Post-Gazette article and watch Keepon on YouTube

Subscribe to Multimedia