PITTSBURGH—Analyses of Twitter feeds have been used to track flu epidemics, predict stock market changes and do political polling, but now that the National Football League season is underway, the natural question is: Can Twitter help beat the spread on NFL games?
The answer, say computer scientists at Carnegie Mellon University, is yes. Or, at least it can help a little bit at certain times during the season. They will report their findings Sept. 27 at the Machine Learning and Data Mining for Sports Analytics conference in Prague, Czech Republic.
The study began as a class project by then-student Kevin Gimpel, now a research assistant professor at Toyota Technological University at Chicago. It ultimately encompassed three NFL seasons, 2010-2012. The researchers used automated tools to sort through a stream of tweets that averaged 42 million messages a day in 2012.
Out of those, the researchers plucked out messages with hashtags associated with individual NFL teams — #giants, #newyorkgiants, #nygiants, #steelers, #steelersnation, etc. — that were sent at least 12 hours after the start of the team's previous game and one hour before the start of its upcoming game. In 2012, more than a million such messages were identified.
The idea, explained Christopher Dyer, assistant professor in CMU's Language Technologies Institute, was to see what might be divined from the collective wisdom or sentiments of fans, as reflected by their tweets. Could simple measures, such as the volume of tweets, or the distribution of positive and negative words in tweets, provide insight into which teams would win, which teams would beat the point spread, or provide guidance on betting on the over/under line (the total number of points scored by both teams in a game)? "It's an experiment every week," Dyer said.
What they found was that their analysis of tweets didn't help much when it came to predicting winners or the over/under score. But when it came to winning with the spread, the researchers said their method was 55 percent accurate. That doesn't offer a huge advantage, Dyer acknowledged, but it might be enough to be profitable.
When setting the spread, of course, sports books don't look only at a team's performance or factors such as home field advantage and weather predictions. The goal is to attract an even amount of bets for both teams, which minimizes the financial risk of the bookie. A certain amount of psychology factors into the spread, Dyer noted, which suggests why gauging the sentiments of team fans might offer some betting advantages.
But the researchers also developed a deep appreciation for the performance of the sports books and for how hard it is to beat the spread. "One thing that surprised us is how hard setting the point spread is to do well," Dyer said. "And the sports books are very, very good."
There were limits to what Twitter could reveal. Dyer said the Twitter analysis didn't work well for the first four weeks of the season. And, in the last few weeks of the season, when many teams begin altering their strategies in preparation for the post-season, the Twitter analysis wasn't useful either.
Though the advantage of Twitter analysis was modest, Dyer said improvement might be possible with a more sophisticated analysis of tweet content. Also, a common difficulty in Twitter analysis is the rapidly changing nature of the social network. "It's a moving target," Dyer explained. "More people are on Twitter this year than the year before and the year before that." The number of tweets analyzed by the researchers for the 2012 season was five times greater than the number of tweets in the 2010 season.
The sports books themselves might benefit from using Twitter analysis, if they aren't already using it, Dyer said. As for the researchers, who included Noah Smith, associate professor of language technologies and machine learning, and Shiladitya Sinha, a student majoring in mathematical sciences, the interest is purely in understanding how Twitter data can be analyzed and used.
"As far as I know," Dyer said, "none of us has actually placed a bet based on our findings."
This work was supported by the National Science Foundation and Sandia National Laboratories.
The Language Technologies Institute and Machine Learning Department are part of CMU's School of Computer Science.