10-831/90-921, Harnessing the Wisdom of Crowds (Special Topics in Machine Learning and Policy)

Sample Syllabus from Spring 2012

Course Description

This course is intended for Ph.D. students in Heinz College, the Machine Learning Department, and other university departments who wish to engage in detailed exploration of a specific topic at the intersection of machine learning and public policy. Qualified master's students may also enroll with permission of the instructor; all students are expected to have some prior background in machine learning and/or artificial intelligence (10-601, 10-701, 90-866, 90-904/10-830, or a similar course). This year's course will focus on the topic of Harnessing the Wisdom of Crowds. We will investigate a variety of approaches which involve mining massive quantities of data created by many human users, from a machine learning perspective. We will consider both "active crowdsourcing", which requires providing users with incentives (financial, entertainment, altruistic, etc.) to perform desired actions, and "passive crowdsourcing", which exploits the various traces of data created by individuals' day-to-day behavioral patterns. Specific machine learning challenges include evaluating and optimally combining individuals' different types and levels of expertise, creating incentive structures which achieve desired goals, combining machine and human learning, effectively coordinating the crowd to perform structured and creative tasks, and understanding when the wisdom of crowds can fail (e.g. cascade effects). We will consider a variety of policy and management applications ranging from public health and human rights to mass collaboration, microfinance, and marketing. We will explore these challenges and opportunities in detail through lectures, discussions on current research articles and future directions, and course projects, with the goals of understanding and advancing the current state of the art.

Course Objectives

Upon completion of this course, the student will be able to:

1. Discuss selected topics and research directions in Harnessing the Wisdom of Crowds, such as evaluating and optimally combining individuals' different types and levels of expertise, and policy and management applications.

2. Present current topics in machine learning and policy, focusing on the wisdom of crowds, by synthesizing and summarizing the current state of the art, briefly reviewing specific research articles and their relevance to the topic, and facilitating discussion by posing questions, preliminary conclusions, and ideas to explore.

3. Develop a research project relevant to Harnessing the Wisdom of Crowds and produce a report describing the project's background, methods, results, and conclusions.

Class Schedule

Mondays and Wednesdays, 10:30-11:50am, Hamburg Hall 1003


Class participation: 25%
Topic presentation: 25%
Project proposal presentation (4/2): 5%
Project proposal (due 4/2): 5%
Final presentation (5/2): 10%
Final report (due 5/2): 30%

Class Participation

One major goal of this course is to have engaging and insightful group discussions about selected topics and research directions in Harnessing the Wisdom of Crowds, and thus active participation by all students in these discussions is an essential component of the course. Students are expected to attend all class meetings, to read assigned research articles in advance, and to contribute useful insights, comments, and questions to the discussions.

Topic Synthesis Presentations

Nine of the fourteen course meetings will be devoted to discussion of specific topics in Harnessing the Wisdom of Crowds. Each student is expected to give a high quality, twenty-minute PowerPoint presentation, followed by twenty minutes of class discussion, at one of these meetings. The primary goals of each topic presentation should be to synthesize and summarize the current state of the art for the given topic, overviewing the major methodological approaches and open problems, and to facilitate the remainder of the discussion by posing questions for discussion, preliminary conclusions, and ideas to explore. It may be useful to go into the details of one or more papers, but your presentation should be a synthesis of the literature (not a "book report"). A set of suggested topics for these presentations has been provided in the syllabus below, but other topics can also be considered based on student interest.

To ensure that presentations will be useful and relevant for the class, each presenter is required to send the instructor a proposed set of two electronically available research articles that the class should read, at least one week prior to the presentation. The assigned reading(s) could be a review paper on the topic, or a landmark work representing a major advance on the topic; it may or may not be one of the specific articles reviewed in the presentation. The instructor will provide feedback and suggestions, and will post the articles on Blackboard so that the class can read them in advance of the presentation. The "Resources" section of Blackboard provides links to some suggested readings (feel free to use some of these in your presentations as relevant, but you should also find additional readings). You can also look through recent ML conference proceedings (KDD, ICML, AAAI, NIPS), other conference proceedings (CHI, UIST), and journals (MLJ, JMLR, JASA). For many specific topics, the instructor can suggest a few additional papers/sources to get you started, and online resources such as Citeseer and Google Scholar will also be helpful.

Grading for topic synthesis presentations will be based on: 20% advance preparation (were your suggested readings relevant and interesting, and were they provided to the instructor one week in advance?), 20% facilitating discussion (did you pose interesting questions, ideas, and topics for discussion; did you help to facilitate the discussion; and did you leave sufficient time for the discussion?), 20% synthesis (were you able to understand and convey the "big picture" of your topic, what has been done previously, and open questions, not just give details of specific papers?), 20% presentation (quality of presentation slides, quality of oral presentation), and 20% overall quality (as evaluated both by the class and the instructor).

Course Projects

All students are expected to be involved in a research project relevant to Wisdom of Crowds, to make significant progress on this research over the duration of the course, and to produce a written document describing the project's background (including a description of any previous work by the student and related work by others), methods, results, and conclusions. You are encouraged (but not required) to work in groups of 2-3 students on this project. Each group will be expected to give two brief presentations of their work to the class (at the beginning of the course, describing their proposed work, and at the end of the course, describing their completed work), and to submit a short (1-2 page) proposal, thus providing opportunities for their work to benefit from feedback both from the instructor and from the class. If desired, the course project can be part of the students' ongoing doctoral research (in which case the group's proposal should make it clear what specific aspect of this work will be addressed during the duration of the course), or can be a smaller-scale project specific to the course. Note that the course project requirement can be waived for students auditing the course, but all students are expected to give a topic synthesis presentation and to be active participants in class discussions.

Grading for course projects will be based on: 20% significance of the problem, 20% novelty of the proposed approach, 20% correctness of the methodology, 20% clarity and completeness of the writeup, and 20% progress made over the course duration.



Michael Bernstein, MIT (SCS Faculty Candidate), "Crowd-Powered Systems". Talk is at 10am, in Gates-Hillman 6115; please stay afterward for group discussion if you are able.

Readings: Bernstein10, Bernstein11 (please read at least one!)

(W 3/21) Course Introduction

Introductions (be prepared to speak for two minutes each about your background and interests)
Discussion of the course syllabus (course structure, goals, topic synthesis presentations, course projects)
Brief lecture/discussion introducing Wisdom of Crowds

Reading: Doan (required)

(F 3/23) Optional guest lecture

For anyone who might be interested, Sandy Pentland from MIT will be giving a talk on "Patterns of Leadership" from noon-1:30pm in Hamburg Hall 1502. Pizza will be served. Sandy is a pioneer in passive crowdsourcing and coined the term "reality mining" (see next lecture); his talk will focus on what can be learned about patterns of communication in the workplace by outfitting employees with "sociometric badges" containing a variety of sensors.

(M 3/26) Discussion Topic 1: Overview of Passive Crowdsourcing

Some questions for discussion: What are the different ways in which we can mine the vast quantities of digital information that people produce in their daily lives? Examples include "reality mining" (using location, contact, and sensor information from cellular telephones), monitoring and aggregating Internet use (search queries, user-generated Web content, clickstream data), and many other possibilities (including electronic billboards that can gear their advertisement to the user and then monitor their reaction... scary stuff!)

Readings: Eagle (required), Kinsella (required), Banerjee (optional), Lane (optional), Pentland (optional).

(W 3/28) Discussion Topic 2: Overview of Active Crowdsourcing

Some questions for discussion: How can we incentivize users to provide useful information or to solve desired problems (e.g. financial incentives / micropayments, entertainment, altruism, "side effects")? How can Mechanical Turk be used effectively for paid crowdsourcing?

Readings: Mason (required), Polgreen (required), Bernstein (optional), Servan-Schreiber (optional).

(M 4/2) Project Proposal Presentations

Each group will present a short PowerPoint presentation on their proposed course project, leaving sufficient time for class discussion and suggestions, as well as turning in a short (1-2 page) proposal.

Also on Monday 4/2: there is an optional guest lecture by John Brownstein (Harvard Medical School) on "Digital Disease Detection", to be held from noon-1:20pm in HBH 1502. Pizza will be served.

On Tuesday 4/3: another optional guest lecture by Duncan Watts (Yahoo! Research) on "Using the Web to do Social Science", to be held from noon-1:20pm in HBH 1000. Pizza will be served.

(W 4/4) Discussion Topic 3: Active Crowdsourcing using Games with a Purpose

Questions for discussion: How can we harness the "spare cycles" of the crowd to perform useful computational tasks, using entertainment as an incentive? What are the core mechanisms (e.g. input agreement, output agreement) that enable us to draw useful conclusions from these "games with a purpose"?

Readings: Chiou (required), von Ahn (required), Cooper (optional), Jain (optional).

(W 4/11) Discussion Topic 4: Policy and Management Applications of Crowdsourcing

Some questions for discussion: How can crowdsourcing best be applied to diverse policy and management applications including disease surveillance, disaster response, international development, prediction markets, crowdfunding, and product marketing?

Readings: Ghose (required), Thies (required), Eagle (optional), Gupta (optional),Nagar (optional)

(M 4/16) Discussion Topic 5: Crowdsourcing Complex Tasks

Some questions for discussion: How can complex tasks be decomposed so that they can be effectively solved by the crowd (e.g. via Mechanical Turk)? How can the input of many users be coordinated to form high-quality content (e.g. Wikipedia)?

Readings: Kittur- CrowdForge (required), Calvo (required), Kittur- Wikipedia (optional), MacCormack (optional), Walter (optional).

(W 4/18) Discussion Topic 6: The Stupidity of Crowds / Smartening the Crowd

Some questions for discussion: When can the wisdom of crowds fail (e.g. cascade effects and herding)? On the other hand, under what conditions might we expect the crowd to outperform a purely machine-based algorithms or a single human expert? How can algorithms such as task clustering, as well as other HCI design decisions, be used to improve the crowd's performance?

Readings: Hullman (required), Liu (required), Fleder (optional), Salganik (optional), Yang (optional)

(M 4/23) Discussion Topic 7: Crowdsourcing Creativity and Scientific Discovery

Some questions for discussion: How can we best harness the creativity of the crowd for idea and product generation, as well as policy creation? How can the crowd play an active role in facilitating scientific discovery?

Readings: Huang (required), Cooper (optional), COMPETES report (optional)

(W 4/25) Discussion Topic 8: Active Learning and User Modeling for Crowdsourcing

Some questions for discussion: How can we decide which users to ask and what questions to ask them? How can we combine noisy data from many users, e.g. via task repetition and user modeling? How can we tell who are the "experts" at which tasks?

Readings: Karger (required), Yan (required), Donmez (optional), Settles (optional), Welinder (optional)

(M 4/30) Discussion Topic 9: Integrating Human and Machine Learning

Some questions for discussion: How can human computation be used as a building block to improve machine learning algorithms such as ranking and filtering? How can we incorporate both ML methods and human users "in the loop" to perform tasks better than either could achieve alone?

Readings: Kamar (required), Zhang (required), Kumar (optional), Venetis (optional)

(W 5/2) Final Project Presentations; project reports due today (11:59pm)

Each group will give a short PowerPoint presentation on their course project. Please plan to speak for no more than ten minutes, and leave five minutes for class discussion.