Minqing Hu and Bing Liu, KDD 2004

From ScribbleWiki: Analysis of Social Media

Jump to: navigation, search

Mining and summarizing customer reviews

ACM Portal page | Slides for class presentation

There is a large (and growing) number of user reviews on the web and consumers and Manufacturer interested to know about them to make informed decisions. This paper offer a feature based summarization method to accomplish this.

Their approach is three steps:

1. Mining product features (only explicit features)

2. Idenfying opinion sentences and their polarity (i.e., positive or negative)

2a. Find adjectives => Opinion words

2b. Determine word sentiment orientation (polarity) using WordNet

2c. Determine sentence polarity

3. Summarizing the result

They explain that their Feature based summarization (FBS), different than regular summarization in that:

1. Output is structured not natural language

2. Only focuing on polar aspects of reviews not just a subset of sentences

The paper reviews and the neighboring fields:

1. Subjective Genre Classification: topic classification but exploting subjectivity (i.e., expressing opinions). Diff: no explicit polarity extraction

2. Sentiment Classification: methods include: using domain dependent lexicon with sentiment annotation, PMI difference between seed words and target words (Turney), Supervised methods (Pang). Diff: at the document level and not the sentence level and no features used

3. Text summarization (template instantiation + passage extraction) Different criteria (e.g., prominence) & different granularity (passage instead of features)

4. Terminology Finding: syntactic rules: intractable, statisitcal: missing infrequent words. Diff: low precision in compare to association mining and their modified version of association mining get infrequent features

Their system design is explained below (see the image on the bottom of this page as well):

1. 1a. Finding noun phrases after preprocessing

1b. Finding frequent features with association mining (apriori algorithm) - assumption is people use similar words but different stories when talking about product features.

After this step, they perform "compactness" pruning to remove meaningless features and also redundacy pruning the remove subset features.

2. 2a. Opinion sentence has at least one opinion word and one feature: The strap is horrible and gets in the way of parts of camera you need to access.

2b. Orientation: McKeown's method needed large corpus and Turney's (processing time), ... They start from small seed set and then expand with WordNet synonyms and antonym when necessary Infrequent features: Find the opinion word and then find the closest noun phrase: Cons: a lot of irrelevant features / But their ranking by relevance will rank them low.

2c. Summing up the possitive and negative orientation of the words close to the feature and consider negation. e.g., "but" or "not"

3. Summary generation: - Separating possitive and negative ones and provide count - Rank based on frequency of the features

The experiments were run on 7 product reviews (in 5 product types) that are manually annotated by the authors. In the results they show how each step improve the performance. They reach 80% precision and 72% recall on the feature extraction, 69% precision and 64% recall on the opinion sentences extraction and 84% accuracy on determining the orientation of the sentence.

  • BibTex
author = {Minqing Hu and Bing Liu},
title = {Mining and summarizing customer reviews},
booktitle = {KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining},
year = {2004},
isbn = {1-58113-888-1},
pages = {168--177},
location = {Seattle, WA, USA},
doi = {http://doi.acm.org/10.1145/1014052.1014073},
publisher = {ACM},
address = {New York, NY, USA},


Annotated by Mehrbod

Personal tools
  • Log in / create account