Collective Opinion Fraud Detection: Identifying and Integrating Cues from Language, Behavior, and Networks

In addition to the PIs, the following graduate students work on the project.

Alex Beutel (CMU, PhD'16)
Dhivya Eswaran (CMU, PhD'20)
Andrea Kahn (UW, MS, Linguistics)
Huayi Li (UIC, PhD student)
Hannah Rashkin (UW, PhD, CSE)
Shebuti Rayana (SBU, PhD student)
Neil Shah (CMU, PhD'17)
Junting Ye (SBU, PhD student)

3. RESEARCH

3.1. Project goals

Technical Merits:

Given the critical issues of opinion fraud in online communities, how can one identify fake reviews and attribute responsible culprits behind them? By conjoining expertise of the PIs over various modalities of deception footprints ranging over language, user behavior, and relational information, this project presents a research program that will result in much needed solutions to this emergent, prevalent, and socially impactful problem. The ultimate goal is to create a unified detection framework via synergistic integration of multiple information sources; from linguistics, user behavior, and network effects, to obtain the best of all worlds. The main idea is to formulate the problem as a relational inference task on composite heterogeneous networks, providing a principled, extensible approach that can blend and reinforce all the above cues towards effective and robust detection of fraud. From a scientific point of view, the research brings together three disciplines: natural language analysis, behavioral modeling, and graph mining. The outcome is a suite of novel, principled, and scalable techniques and models that will enhance our understanding of the creation and dissemination of opinion fraud and misinformation in general at a large scale. The PIs will collaborate with industry partners such as Yelp, Google, and Amazon, directly solicit online fake reviews, and conduct well-designed user studies for testing and validation of their techniques.

Broader Impacts:

The broader impact of our work is that it will enable the development of opinion fraud and misinformation detection solutions that are critical in achieving integrity and credibility on the Web. The outcome of this research will be beneficial to billions of Web users, governments, law enforcement agencies, multi-billion-dollar industries and service providers. As such, the two main bodies that this project will directly and significantly impact are the Web users and the e-commerce site owners. The PIs will collaborate with Yelp in evaluation and integration of their developed techniques and tools. The PIs will further reach out to other industry contacts at Amazon, Google, and TripAdvisor and aim to disseminate research results to them through published manuscripts and tutorials at major conferences where many industry practitioners attend, as well as release publicly available open-source software for opinion fraud detection. The public will also be educated through reaching out to popular press media for interviews and educational press articles.

3.2. Results

GUI for Manual Inspection of Opinion Fraud: We have developed a GUI tool for manual inspection and evaluation of opinion fraud. This Work is shared under Creative Commons Attribution-NonCommercial 4.0 International Public License. Licensees may copy, distribute, display, and perform the Work and make derivative works based on the Work only for non-commercial purposes. You may download the GUI here (zip).

3.3. Related Publications

Higher-Order Label Homogeneity and Spreading in Graphs
Dhivya Eswaran, Srijan Kumar, Christos Faloutsos
The WebConf (formerly WWW), Taipei, Taiwan, April 2020.
Fast and Accurate Anomaly Detection in Dynamic Graphs with a Two-Pronged Approach,
Minji Yoon, Bryan Hooi, Kijung Shin, Christos Faloutsos,
KDD ’19, August 4–8, 2019, Anchorage, AK, USA
SedanSpot: Detecting Anomalies in Edge Streams
Dhivya Eswaran, Christos Faloutsos
IEEE ICDM, Singapore, November 2018.
Social-Affiliation Networks: Patterns and the SOAR Model
Dhivya Eswaran, Reihaneh Rabbany, Artur Dubrawski, Christos Faloutsos
ECML PKDD, Dublin, Ireland, September 2018.
Beyond Anomaly Detection: LookOut for Pictorial Explanation
Nikhil Gupta, Dhivya Eswaran, Neil Shah, Leman Akoglu, Christos Faloutsos
ECML PKDD, Dublin, Ireland, September 2018.
ZooBP: Belief Propagation for Heterogeneous Networks
Dhivya Eswaran, Stephan Gunnemann, Christos Faloutsos, Disha Makhija, Mohit Kumar
VLDB, Munich, Germany, August 2017.
The Power of Certainty: A Dirichlet-Multinomial Model for Belief Propagation
Dhivya Eswaran, Stephan Gunnemann, Christos Faloutsos
SIAM SDM, Houston, USA, April 2017.
FRAUDAR: Bounding Graph Fraud in the Face of Camouflage
Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, Christos Faloutsos
KDD 2016, San Francisco, CA, USA, Aug. 2016. (best paper award)
Temporal Opinion Spam Detection by Multivariate Indicative Signals
Junting Ye, Santhosh Kumar, Leman Akoglu
ICWSM, Cologne, Germany, May 2016.
GOTCHA! Network-based Fraud Detection for Social Security Fraud
Veronique Van Vlasselaer, Tina Eliassi-Rad, Leman Akoglu, Monique Snoeck, Bart Baesens
Management Science (INFORMS), MS-14-00232.R4, 2016.
Collective Opinion Spam Detection using Active Inference
Shebuti Rayana and Leman Akoglu
SIAM SDM, Miami, Florida, May 2016.
BIRDNEST: Bayesian Inference for Ratings-Fraud Detection
Bryan Hooi, Neil Shah, Alex Beutel, Stephan Gunnemann, Leman Akoglu, Mohit Kumar, Disha Makhija, Christos Faloutsos
SIAM SDM, Miami, Florida, May 2016.
Catching Synchronized Behaviors in Large Networks: A Graph Mining Approach.
Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos and Shiqiang Yang.
ACM Transactions on Knowledge Discovery from Data (TKDD), 2015. (Best papers in KDD 2014, Special Issue).
Analyzing and Detecting Opinion Spam on a Large-scale Dataset via Temporal and Spatial Patterns.
Huayi Li, Zhiyuan Chen, Arjun Mukherjee, Bing Liu, Jidong Shao.
International AAAI Conference on Web and Social Media (ICWSM), Oxford, UK, May 2015.
Discovering Opinion Spammer Groups by Network Footprints.
Junting Ye and Leman Akoglu.
ECML/PKDD, Porto, Portugal, Sep. 2015.
Collective Opinion Spam Detection: Bridging Review Networks and Metadata
Shebuti Rayana and Leman Akoglu
ACM SIGKDD, Sydney, AU, Aug. 2015.
RSC: Mining and Modeling Temporal Activity in Social Media.
Alceu Ferraz Costa, Yuto Yamaguchi, Agma Juci Machado Traina, Caetano Traina Jr. and Christos Faloutsos.
KDD, Sydney, Australia, Aug. 2015.
ACCAMS: Additive Co-Clustering to Approximate Matrices Succinctly.
Alex Beutel, Amr Ahmed, Alexander Smola.
24th International World Wide Web Conference (WWW), Florence, Italy, May 2015.
The Web as a Jungle: Non-Linear Dynamical Systems for Co-evolving Online Activities.
Yasuko Matsubara, Yasushi Sakurai, Christos Faloutsos.
24th International World Wide Web Conference (WWW), Florence, Italy, May 2015.
Event Detection and Factuality Assessment with Non-Expert Supervision.
Kenton Lee, Yoav Artzi, Yejin Choi, Luke Zettlemoyer.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015.
Keystroke Patterns as Prosody in Digital Writings: A Case Study with Deceptive Reviews and Essays.
Song Feng, Ritwik Banerjee, Jun Seok Kang and Yejin Choi.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
Detecting Campaign Promoters on Twitter using Markov Random Fields
Huayi Li, Arjun Mukherjee, Bing Liu, Rachel Kornfield, Sherry L. Emery
IEEE International Conference on Data Mining (ICDM'14), December 14-17, 2014 - Shenzhen, China.
Spotting Fake Reviews via Collective Positive-Unlabeled Learning
Huayi Li, Zhiyuan Chen, Bing Liu, Xiaokai Wei, Jidong Shao
IEEE International Conference on Data Mining (ICDM'14), December 14-17, 2014 - Shenzhen, China.
CatchSync: Catching Synchronized Behavior in Large Directed Graphs Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, Shiqiang Yang KDD, 2014.
Opinion Fraud Detection in Online Reviews using Network Effects
Leman Akoglu, Rishi Chandy, Christos Faloutsos
Proceedings of The International AAAI Conference on Weblogs and Social Media (ICWSM-2013), July 8-10, 2013, Boston, USA
CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013.
What Yelp Fake Review Filter Might Be Doing Arjun Mukherjee, Vivek Venkataraman, Bing Liu, and Natalie Glance, Proceedings of The International AAAI Conference on Weblogs and Social Media (ICWSM-2013), July 8-10, 2013, Boston, USA.
Exploiting Burstiness in Reviews for Review Spammer Detection, Geli Fei, Arjun Mukherjee, Bing Liu, Meichun Hsu, Malu Castellanos, and Riddhiman GhoshProceedings of The International AAAI Conference on Weblogs and Social Media (ICWSM-2013), July 8-10, 2013, Boston, USA
Review Spam Detection via Temporal Pattern DiscoverySihong Xie, Guan Wang, Shuyang Lin, Philip S Yu, Proc. ACM KDD Conference, Beijing, China, Aug. 2012.
Syntactic Stylometry for Deception Detection.
Song Feng, Ritwik Banerjee and Yejin Choi.
Association for Computational Linguistics (ACL), 2012.
Finding Deceptive Opinion Spam by Any Stretch of the Imagination.
Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey Hancock.
Association for Computational Linguistics (ACL) , 2011.

3.4. Tutorials/Workshops

Graph-Based User Behavior Modeling: From Prediction to Fraud Detection, Alex Beutel, Leman Akoglu, Christos Faloutsos, ACM SIGKDD Tutorial, Sydney, AU, August 2015 (3h-tutorial).
Mining and Forecasting of Big Time-series data
Yasushi Sakurai,Yasuko Matsubara, Christos Faloutsos, ACM SIGMOD, Melbourne, AU, May 2015 (3h-tutorial).
Smart Analytics for Big Time-series Data Yasushi Sakurai, Yasuko Matsubara, and Christos Faloutsos. KDD 2017, Halifax, Nova Scotia, Canada, Aug. 13-17, 2017.
Data-Driven Approaches towards Malicious Behavior Modeling Meng Jiang, Srijan Kumar, VS Subrahmanian, and Christos Faloutsos. KDD 2017, Halifax, Nova Scotia, Canada, Aug. 13-17, 2017.
Graph and Tensor Mining for Fun and Profit , Xin Luna Dong, Christos Faloutsos, Andrey Kan, Subhabrata Mukherjee and Jun Ma. KDD 2018, London UK, Aug. 19-23, 2018.
Fact checking: theory and practice, Xin Luna Dong, Christos Faloutsos, Xian Li, Subhabrata Mukherjee, and Prashant Shiralkar, KDD 2018, London UK, Aug. 19-23, 2018.
Forecasting Big Time Series: Old and New Christos Faloutsos, Jan Gasthaus, Tim Januschowski and Yuyang Wang, VLDB 2018, Rio De Janeiro, Brazil, Aug. 27-31, 2018.
Forecasting Big Time Series: Theory and Practice, Christos Faloutsos, Valentin Flunkert, Jan Gasthaus, Tim Januschowski and Yuyang (Bernie) Wang, TheWebConf (aka WWW), April 2020, Taipei, Taiwan.

4. EDUCATION - dissertations

The educational contributions of the project include:

Dhivya Eswaran, Mining Anomalies using Static and Dynamic Graphs, April 2020.
Neil Shah, Anomaly Detection in Large Social Graphs. October, 2017.
Shebuti Rayana, Ensemble and Multimodal Learning for Anomaly Mining: Algorithms and Applications. August, 2017.
Huayi Li, Detecting Opinion Spam in Commercial Review Websites. August, 2016.
Alex Beutel, User Behavior Modeling with Large-Scale Graph Analysis. May, 2016. (KDD dissertation award - runner-up)
Danai Koutra, Exploring and Making Sense of Large Graphs. Dissertation, CMU, August 2015. (KDD dissertation award).

Points of Contact:

Christos Faloutsos, christos AT cs.cmu.edu, and Leman Akoglu, lakoglu AT andrew.cmu.edu

Last updated: April 2020, by Christos Faloutsos

III: Medium: Collaborative Research:
Collective Opinion Fraud Detection:
Identifying and Integrating Cues from Language, Behavior, and Networks

1. GENERAL INFORMATION

1.1. Abstract

1.2. Keywords

1.3. Funding agency

2. PEOPLE INVOLVED

3. RESEARCH

3.1. Project goals

3.2. Results

3.3. Related Publications

3.4. Tutorials/Workshops

4. EDUCATION - dissertations

Points of Contact:

III: Medium: Collaborative Research: Collective Opinion Fraud Detection: Identifying and Integrating Cues from Language, Behavior, and Networks

1. GENERAL INFORMATION

1.1. Abstract

1.2. Keywords

1.3. Funding agency

2. PEOPLE INVOLVED

3. RESEARCH

3.1. Project goals

3.2. Results

3.3. Related Publications

3.4. Tutorials/Workshops

4. EDUCATION - dissertations

Points of Contact:

III: Medium: Collaborative Research:
Collective Opinion Fraud Detection:
Identifying and Integrating Cues from Language, Behavior, and Networks