Esuli et al, CIKM 2005

From ScribbleWiki: Analysis of Social Media

(Redirected from Esuli et al. CIKM 2005)
Jump to: navigation, search

Determining the semantic orientation of terms through gloss analysis

Citeseer Page

This paper describes classification of polarity of words using word defitions (gloss) in dictionaries and also word relations in thesaurus. They provide an overview of previous work and used all of these datasets to evaluate:

  • Hatzivassilogulou and McKeown (HM): used conjunctions (and, but, etc.) between adjectives. Dataset was adjectives extracted from Wall Street Journal.
  • Turney and Littman (TL) - see Turney, ACL 2001: using the difference in PMI between the word and seed sets of possitive and negative words. Dataset was from National Inquirer lexicon.
  • Kamps et al. (KA): using graphs path distanced in a graph created based on WordNet lexical relation between terms (instead of PMI). Daraset was subset of TL (because some lexical relations do not exist)

Their process is three steps:

1. Creating a seed set of positive and negative 2. Use lexical relations (e.g., synonymy) in thesaurus to extend the set (assumption: relation in meaning is the same as relation in polarity) 3. Combine the gloss for all word in each set (assumption: words with similar gloss have similar polarity) 4. Train a binary classifier on the result of step 3

WordNet is used for step 2 (synonymy, antonymy, hypernymy, etc.) and step 3. For step 4, they used cosine of TFIDf word vectors with Naive Bayes (NB), Linear kernel SVM and PrTDIDF classifiers.

In all methods, term and gloss was include but in four variation of term representation that is tested was about wether or not the sample sentence was included and wether or not negation is considered in the sentence. Testing with NB without seed expansion it is shown that using all information is better (68% on the KA set)

In the seed set expansion part, they have compared result of include and excluding certain lexical relations and also whether the expansion is restriced to adjectives or not. The best result was synonyms and antonyms restricred on adjectives.

Finally, they report the final results (accuracy) using each classifier (results are very similar for three classifiers: HM 87% (SVM), TL 83% (PrTFIDF), KA 88% (SVM). The improvement for TL is not significant from previous result on this set but their method is much less time consuming as it does not involve LSA or querying the web.

Their result using Merriam-Webster online discitonary instead of WordNet was: HM 83%, TL 79%, KA 85% which shows that they can avoid using WordNet with little loss of accuracy.

The main author is also part of SentiWordNet project which is to automatically annotate WordNet with word polarity.

  • Bibtex:
 @misc{ esuli05determining,
 author = "A. Esuli and F. Sebastiani",
 title = "Determining the semantic orientation of terms through gloss analysis",
 text = "Andrea Esuli and Fabrizio Sebastiani. 2005. Determining the semantic orientation
   of terms through gloss analysis. In Proceedings of CIKM-05, 14th ACM International
   Conference on Information and Knowledge Management, pages 617--624, Bremen,
   DE.",
 year = "2005",
 url = "citeseer.ist.psu.edu/esuli05determining.html" }
Views
Personal tools
  • Log in / create account