Text-driven forecasting is an emerging collection of problems in which text documents or document collections are automatically analyzed to make specific, testable predictions about the future. Well-known examples include predictions about stock or market behavior, product sales patterns, government elections, legislative activities, or public opinion polls.
While a research community focusing on these problems has yet to form, this course is based on the following observations:
This twelve-credit seminar-project hybrid course aims to begin identifying challenge problems and testing some solutions to them.
The time and location are TBD; please contact the instructor if you are interested in participating.
The course will meet twice a week for the first month or so, operating like a seminar with discussion of two or three papers per week and brainstorming. The remainder of the semester will focus on team projects, which will be the bulk of the grade. Each team of approximately three students will build a system that uses a text database to make testable, future predictions.
A student wishing to audit the course will be expected to attend the course meetings, serve as an informal consultant to one of the teams and write a short "lessons learned" paper at the end of the semester.
This course counts as a "lab" for LTI students.
|Date||Readings to discuss||Notes|
|Tu 8-25||None; introductions, administrivia, and high-level discussion about the course.|
|Th 8-27||Das and Chen, 2007: Yahoo! for Amazon: Sentiment extraction from small talk on the Web. This is a journal version of a much-cited 2001 paper you can find here.||Note that the classification techniques in this paper are very simplistic, from the point of view of machine learning as well as computational linguistics. Brendan's notes.|
Koppel and Shtrimberg, 2004: Good news or bad news? Let the market decide. |
Lavrenko, Schmill, Lawrie, Ogilvie, Jensen, and Allen, 2000: Mining of concurrent text and time series.
Ghose, Ipeirotis, and Sundararajan, 2007: Opinion mining using econometrics: a case study on reputation systems. |
Kogan, Levin, Routledge, Sagi, and Smith, 2009: Predicting risk from financial reports with regression.
Antweiler and Frank, 2005: Do US stock markets typically overreact to corporate news stories?|
Skim only: Antweiler and Frank, 2004: Is all that talk just noise? The information content of Internet message boards.
|Th 9-10||Danescu-Niculescu-Mizil, Kossinets, Kleinberg, and Lee, 2009: How opinions are received by online communities: A case study on Amazon.com helpfulness votes.||Mahesh's notes.|
|Tu 9-15||Monroe, Colaresi, and Quinn, 2009: Fightin' words: Lexical feature selection and evaluation for identifying the content of political conflict.||Dipanjan's notes.|
|Th 9-17||Lerman, Gilder, Dredze, and Pereira, 2008: Reading the markets: Forecasting public opinion of political candidates by news analysis||Ramnath's notes.|
|Tu 9-22||(no meeting)|
|Th 9-24||Gentzkow and Shapiro, 2007: What drives media slant? Evidence from U.S. daily newspapers.||Neel's notes.|
|Tu 9-29||Fader, Radev, Crespin, Monroe, Quinn, and Colaresi, 2007: MavenRank: Identifying influential members of the U.S. Senate using lexical centrality.||Dipanjan's notes.|
|Th 10-1||Tausczik and Pennebaker, 2009: The psychological meaning of words: LIWC and computerized text analysis methods.|
|Tu 10-6||Project proposals|
|Th 10-8||Project selection and division into teams|
|Tu 10-13||Zhang and Skiena, 2009: Improving movie gross prediction through news analysis.|
|Tu 10-20||Dodds and Danforth, 2009 Measuring the happiness of large-scale written expression: songs, blogs, and presidents.|
|Tu 10-27||Simonoff and Sparrow, 2000: Predicting movie grosses: Winners and losers, blockbusters and sleepers.|
|Tu 11-3||Friedman, Hastie, Tibshirani, 2009: Regularization paths for generalized linear models via coordinate descent.|
|Tu 11-10||Mishne and Glance, 2006: Predicting movie sales from blogger sentiment.|
|Tu 11-17||(no paper)|
|Tu 11-24||Liang, Jordan, Klein, 2009: "Learning semantic correspondences with less supervision.|
|Th 12-3||Final project presentations (Thursday, not Tuesday!)|