Four-city Datasets


This dataset consists of fake reviews for 8 hotels in 4 cities (i.e. Chicago, New York, Los Angeles and Houston) from mechanical turk. Deceptive FOUR-CITIES reviews are further augmented with a matching set of truthful reviews from TripAdvisor by randomly sampling 40 positive (5- star) reviews for each of the eight chosen hotels. While we cannot know for sure that the sampled re- views are truthful, previous work has suggested that rates of deception among popular hotels is likely to be low.

[ fake.zip | true.zip ]

Sources (citation)

Jiwei Li, Myle Ott and Claire Cardie. Identifying Manipulated Offerings on Review Portals. In EMNLP 2013.

Myle Ott, Yejin Choi, Claire Cardie and Jeffrey Hancock. Finding Deceptive Opinion Spam by Any Stretch of the Imagination . In ACL 2011.

Tao Wang. A Tree Model For Summarization on Twitter