Gruhl et al, KDD 2005
From ScribbleWiki: Analysis of Social Media
The paper is an important initial study to demonstrate a link between online content (blogs) and customer behavior (such as purchases). This link is 'assumed' by other research in the analysis of social media that aims to understand user behavior and create market intelligence. So it is a critical question to ask.
Although the problem is fairly broad in its scope, the authors have tackled a simpler problem to demonstrate conclusively that in some cases, spikes in sales rank may be predicted based on online chatter. Before solving this problem they demonstrate that carefully hand-crafted queries produce matching postings whose volume predicts sales ranks. Then they argue that these queries can be automatically generated in many cases. Finally they show that even though sales rank motion might be difficult to predict in general, algorithmic predictors can use online postings to successfully predict spikes in sales rank.
The data used for experimentation is Amazon's sales rank data for 2,340 books over a period of 4 months, and correlating postings in blogs, media and web pages.
Authors have rightly pointed out that the causation for the sales spike may not have corresponding blog mention increases because there are several factors like marketing promotions, bulk purchases, book releases etc which are not blog related.