III: Medium: Collaborative Research:
Collective Opinion Fraud Detection:
Identifying and Integrating Cues from Language, Behavior, and Networks

 
PI: Leman Akoglu
Co-PI: Yejin Choi
Phone: 1 (631) 632 9801
Department of Computer Science Fax: 1 (631) 632 1784
Stony Brook University Email: {leman,ychoi} AT cs.stonybrook.edu
Stony Brook, NY 11794 Website: http://www.cs.stonybrook.edu/~leman
 
PI: Bing Liu Phone: 1 (312) 685 2570
Department of Computer Science Fax: 1 (312) 413 0024
University of Illinois at Chicago Email: liub AT cs.uic.edu
Website: http://www.cs.uic.edu/~liub/
 
PI: Christos Faloutsos Phone: 1 (412) 268 1457
Computer Science Department Fax: 1 (412) 268 5576
Carnegie Mellon University Email: christos AT cs.cmu.edu
Pittsburgh, PA 15213 Website: www.cs.cmu.edu/~christos/

This material is based upon work supported by the National Science Foundation under Grant No. IIS-1408924. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

1. GENERAL INFORMATION

1.1. Abstract

Link to NSF abstract

1.2. Keywords

Opinion Fraud, Fraud Detection, Deception, Linguistic Patterns, Behavioral Analysis, Graph Mining.

1.3. Funding agency


2. PEOPLE INVOLVED

In addition to the PIs, the following graduate students work on the project.

3. RESEARCH

3.1. Project goals

Technical Merits:

Given the critical issues of opinion fraud in online communities, how can one identify fake reviews and attribute responsible culprits behind them? By conjoining expertise of the PIs over various modalities of deception footprints ranging over language, user behavior, and relational information, this project presents a research program that will result in much needed solutions to this emergent, prevalent, and socially impactful problem. The ultimate goal is to create a unified detection framework via synergistic integration of multiple information sources; from linguistics, user behavior, and network effects, to obtain the best of all worlds. The main idea is to formulate the problem as a relational inference task on composite heterogeneous networks, providing a principled, extensible approach that can blend and reinforce all the above cues towards effective and robust detection of fraud. From a scientific point of view, the research brings together three disciplines: natural language analysis, behavioral modeling, and graph mining. The outcome is a suite of novel, principled, and scalable techniques and models that will enhance our understanding of the creation and dissemination of opinion fraud and misinformation in general at a large scale. The PIs will collaborate with industry partners such as Yelp, Google, and Amazon, directly solicit online fake reviews, and conduct well-designed user studies for testing and validation of their techniques.

Broader Impacts:

The broader impact of our work is that it will enable the development of opinion fraud and misinformation detection solutions that are critical in achieving integrity and credibility on the Web. The outcome of this research will be beneficial to billions of Web users, governments, law enforcement agencies, multi-billion-dollar industries and service providers. As such, the two main bodies that this project will directly and significantly impact are the Web users and the e-commerce site owners. The PIs will collaborate with Yelp in evaluation and integration of their developed techniques and tools. The PIs will further reach out to other industry contacts at Amazon, Google, and TripAdvisor and aim to disseminate research results to them through published manuscripts and tutorials at major conferences where many industry practitioners attend, as well as release publicly available open-source software for opinion fraud detection. The public will also be educated through reaching out to popular press media for interviews and educational press articles.

3.2. Results

  • GUI for Manual Inspection of Opinion Fraud: We have developed a GUI tool for manual inspection and evaluation of opinion fraud. This Work is shared under Creative Commons Attribution-NonCommercial 4.0 International Public License. Licensees may copy, distribute, display, and perform the Work and make derivative works based on the Work only for non-commercial purposes. You may download the GUI here (zip).

3.3. Related Publications

    1. Social-Affiliation Networks: Patterns and the SOAR Model
      Dhivya Eswaran, Reihaneh Rabbany, Artur Dubrawski, Christos Faloutsos
      ECML PKDD, Dublin, Ireland, September 2018.

    2. Beyond Anomaly Detection: LookOut for Pictorial Explanation
      Nikhil Gupta, Dhivya Eswaran, Neil Shah, Leman Akoglu, Christos Faloutsos
      ECML PKDD, Dublin, Ireland, September 2018.

    3. ZooBP: Belief Propagation for Heterogeneous Networks
      Dhivya Eswaran, Stephan Gunnemann, Christos Faloutsos, Disha Makhija, Mohit Kumar
      VLDB, Munich, Germany, August 2017.  

    4. The Power of Certainty: A Dirichlet-Multinomial Model for Belief Propagation
      Dhivya Eswaran, Stephan Gunnemann, Christos Faloutsos
      SIAM SDM, Houston, USA, April 2017.  

    5. Temporal Opinion Spam Detection by Multivariate Indicative Signals
      Junting Ye, Santhosh Kumar, Leman Akoglu
      ICWSM, Cologne, Germany, May 2016.  

    6. GOTCHA! Network-based Fraud Detection for Social Security Fraud  
      Veronique Van Vlasselaer, Tina Eliassi-Rad, Leman Akoglu, Monique Snoeck, Bart Baesens
      Management Science (INFORMS), MS-14-00232.R4, 2016.  

    7. Collective Opinion Spam Detection using Active Inference  
      Shebuti Rayana and Leman Akoglu
      SIAM SDM, Miami, Florida, May 2016.  

    8. BIRDNEST: Bayesian Inference for Ratings-Fraud Detection  
      Bryan Hooi, Neil Shah, Alex Beutel, Stephan Gunnemann, Leman Akoglu, Mohit Kumar, Disha Makhija, Christos Faloutsos
      SIAM SDM, Miami, Florida, May 2016.  

    9. Catching Synchronized Behaviors in Large Networks: A Graph Mining Approach.  
      Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos and Shiqiang Yang.
      ACM Transactions on Knowledge Discovery from Data (TKDD), 2015. (Best papers in KDD 2014, Special Issue - to appear).

    10. Analyzing and Detecting Opinion Spam on a Large-scale Dataset via Temporal and Spatial Patterns.  
      Huayi Li, Zhiyuan Chen, Arjun Mukherjee, Bing Liu, Jidong Shao.
      International AAAI Conference on Web and Social Media (ICWSM), Oxford, UK, May 2015.

    11. Discovering Opinion Spammer Groups by Network Footprints.  
      Junting Ye and Leman Akoglu.
      ECML/PKDD, Porto, Portugal, Sep. 2015.

    12. Collective Opinion Spam Detection: Bridging Review Networks and Metadata  
      Shebuti Rayana and Leman Akoglu
      ACM SIGKDD, Sydney, AU, Aug. 2015.

    13. RSC: Mining and Modeling Temporal Activity in Social Media.  
      Alceu Ferraz Costa, Yuto Yamaguchi, Agma Juci Machado Traina, Caetano Traina Jr. and Christos Faloutsos.
      KDD, Sydney, Australia, Aug. 2015.

    14. ACCAMS: Additive Co-Clustering to Approximate Matrices Succinctly.  
      Alex Beutel, Amr Ahmed, Alexander Smola.
      24th International World Wide Web Conference (WWW), Florence, Italy, May 2015.

    15. The Web as a Jungle: Non-Linear Dynamical Systems for Co-evolving Online Activities.  
      Yasuko Matsubara, Yasushi Sakurai, Christos Faloutsos.
      24th International World Wide Web Conference (WWW), Florence, Italy, May 2015.

    16. Event Detection and Factuality Assessment with Non-Expert Supervision.  
      Kenton Lee, Yoav Artzi, Yejin Choi, Luke Zettlemoyer.
      Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015.

    17. Keystroke Patterns as Prosody in Digital Writings: A Case Study with Deceptive Reviews and Essays.  
      Song Feng, Ritwik Banerjee, Jun Seok Kang and Yejin Choi.
      Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.

    18. Detecting Campaign Promoters on Twitter using Markov Random Fields  
      Huayi Li, Arjun Mukherjee, Bing Liu, Rachel Kornfield, Sherry L. Emery
      IEEE International Conference on Data Mining (ICDM'14), December 14-17, 2014 - Shenzhen, China.

    19. Spotting Fake Reviews via Collective Positive-Unlabeled Learning  
      Huayi Li, Zhiyuan Chen, Bing Liu, Xiaokai Wei, Jidong Shao
      IEEE International Conference on Data Mining (ICDM'14), December 14-17, 2014 - Shenzhen, China.

    20. CatchSync: Catching Synchronized Behavior in Large Directed Graphs Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, Shiqiang Yang KDD, 2014.

    21. Opinion Fraud Detection in Online Reviews using Network Effects  
      Leman Akoglu, Rishi Chandy, Christos Faloutsos
      Proceedings of The International AAAI Conference on Weblogs and Social Media (ICWSM-2013), July 8-10, 2013, Boston, USA

    22. CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013.

    23. What Yelp Fake Review Filter Might Be Doing Arjun Mukherjee, Vivek Venkataraman, Bing Liu, and Natalie Glance, Proceedings of The International AAAI Conference on Weblogs and Social Media (ICWSM-2013), July 8-10, 2013, Boston, USA.

    24. Exploiting Burstiness in Reviews for Review Spammer Detection, Geli Fei, Arjun Mukherjee, Bing Liu, Meichun Hsu, Malu Castellanos, and Riddhiman GhoshProceedings of The International AAAI Conference on Weblogs and Social Media (ICWSM-2013), July 8-10, 2013, Boston, USA
    25. Review Spam Detection via Temporal Pattern DiscoverySihong Xie, Guan Wang, Shuyang Lin, Philip S Yu, Proc. ACM KDD Conference, Beijing, China, Aug. 2012.

    26. Syntactic Stylometry for Deception Detection.
      Song Feng, Ritwik Banerjee and Yejin Choi. 
      Association for Computational Linguistics (ACL), 2012. 

    27. Finding Deceptive Opinion Spam by Any Stretch of the Imagination.
      Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey Hancock. 
      Association for Computational Linguistics (ACL) , 2011.

3.4. Tutorials/Workshops

  1. Graph-Based User Behavior Modeling: From Prediction to Fraud Detection, Alex Beutel, Leman Akoglu, Christos Faloutsos, ACM SIGKDD Tutorial, Sydney, AU, August 2015 (3h-tutorial).
  2. Mining and Forecasting of Big Time-series data  
    Yasushi Sakurai,Yasuko Matsubara, Christos Faloutsos, ACM SIGMOD, Melbourne, AU, May 2015 (3h-tutorial).
  3. Smart Analytics for Big Time-series Data Yasushi Sakurai, Yasuko Matsubara, and Christos Faloutsos. KDD 2017, Halifax, Nova Scotia, Canada, Aug. 13-17, 2017.
  4. Data-Driven Approaches towards Malicious Behavior Modeling Meng Jiang, Srijan Kumar, VS Subrahmanian, and Christos Faloutsos. KDD 2017, Halifax, Nova Scotia, Canada, Aug. 13-17, 2017.
  5. Graph and Tensor Mining for Fun and Profit , Xin Luna Dong, Christos Faloutsos, Andrey Kan, Subhabrata Mukherjee and Jun Ma. KDD 2018, London UK, Aug. 19-23, 2018.
  6. Fact checking: theory and practice, Xin Luna Dong, Christos Faloutsos, Xian Li, Subhabrata Mukherjee, and Prashant Shiralkar, KDD 2018, London UK, Aug. 19-23, 2018.
  7. Forecasting Big Time Series: Old and New Christos Faloutsos, Jan Gasthaus, Tim Januschowski and Yuyang Wang, VLDB 2018, Rio De Janeiro, Brazil, Aug. 27-31, 2018.

4. EDUCATION - dissertations

The educational contributions of the project include:
  • Neil Shah, Anomaly Detection in Large Social Graphs. October, 2017.
  • Shebuti Rayana, Ensemble and Multimodal Learning for Anomaly Mining: Algorithms and Applications. August, 2017.
  • Huayi Li, Detecting Opinion Spam in Commercial Review Websites. August, 2016.
  • Alex Beutel, User Behavior Modeling with Large-Scale Graph Analysis. May, 2016. (KDD dissertation award - runner-up)
  • Danai Koutra, Exploring and Making Sense of Large Graphs. Dissertation, CMU, August 2015. (KDD dissertation award).

Points of Contact:

Christos Faloutsos, christos AT cs.cmu.edu, and Leman Akoglu, lakoglu AT andrew.cmu.edu
Last updated: Sept. 1, 2018, by Christos Faloutsos and Dhivya Eswaran