Wiebe et al, Computational Linguistics 2005
From ScribbleWiki: Analysis of Social Media
Annotating Expressions of Opinions and Emotions in Language
Wiebe et al, Computational Linguistics 2005 Paper
This paper describes a detailed annotation scheme that identifies key components and properties of subjectivity. It then performs a corpus annotation study to determine if humans are able to reliably follow the annotation and detect subjective phenomena in text. This paper focuses on annotating expressions below the level of the sentence. The rich scheme is able to capture many components of subjective expressions, namely the sources and target of the opinion, as well as various attributes of the subjective expression itself- like intensity and polarity (attitude type).
The main motivation behind annotating at the finer grained level of expressions, rather than sentences is that a sentence may be made up of more than one opinion expression. Additionally, applications like QA, and Information Extraction systems would be able to pinpoint the clauses that contain opinions.
Annotation Scheme: Definitions
The whole subjectivity scheme of this paper is based on the core concept of private state which is defined as an internal mental state that is not open to objective verification or observation to the outside world. Thus private state covers emotions, goals, opinions, beliefs, thoughts, evaluations and judgments. However, direct mentions of the private state experience or expressions (of speech, writing, actions) by the private state holder gives the observer an idea about the private state. The authors create an annotation scheme to capture the expression of the private state and its functional components the experiencer and the target of the private state.
A private state expression in text may be in any one of the following ways:
- Explicit mentions of private state: Here the experiencer’s private state is directly mentioned.
- E.g. “The US fears a spill-over”. In this sentence, the word “fears” directly mentions what the state of US is.
- Speech events expressing private states: When a person speaks, his speech event may reveal his private state.
- E.g. “The report is full of absurdities” said Xirao-Nima. Here the “said” indicates the speech even in which Xirao-Nima expresses his private state (we know about his private state towards the report by his calling it “full of absurdities”)
- Expressive subjective elements: Private states sometimes become apparent to the reader (or observer) by the choice of words or style of writing. Such expressions are called expressive subjective element. E.g. the choice of words “full of absurdities” in the previous example shows Xirao Nima’s private state towards the report.
- Private State action: sometimes actions like “booing” indicate private states of the doer.
Annotation Scheme: Private States
The annotations are done in the form of frames. Frames are defined for expressive subjective elements, direct subjective expressions(mentions of private state and speech events that reveal private states) and objective speech events (speech events that do not reveal a private state). Some important components of the subjective frames are:
- Text anchor (point or span of text that denotes the mention/expression of private state)
- Source (The holder of the private state)
- Target (the topic of the private state)
- Attitude type (Polarity)
Annotation Scheme: Nested Sources
The source of a private state is the speaker or writer. In practice, private states are often filtered through the eyes of another source. For example in the sentence “The US fears a spill-over”, said Xirao-Nima, the word “fears” tells the reader about the private state of the US. However, this is the private state of US only according to Xirao-Nima. Additionally the speech event of Xirao-Nima is reported to the reader by the writer of the news article. Thus there is multiple nesting, which is captured by the authors in the form of Nested Source Annotation.
Agreement Study: Text Anchors
The authors perform agreement study over their defined scheme. As their scheme did not specify rules for boundary agreement, the authors use an agreement metric agr, to measure inter annotator agreement. This is similar to the recall metric, and measures the performance of one annotator with respect to the other. The average agr value for Direct subjective element and objective speech events was found to be higher (0.8) than the agreement for expressive subjective elements (0.72)
The authors also performed pair-wise Kappa statistic over the sentences. A sentence was considered to be subjective if the annotator created one or more direct subjective frames in the sentence. The Kappa statistic for the sentence level was 0.77.
Observations and Analyses
The authors present a number of observations and analyses from their annotation experience.
- The annotators reported that the editorials were the most difficult to annotate, as compared to “hot topic” news articles and objective articles.
- The stronger expression of opinions and emotions tend to be more negative in the corpus
- Stronger the expression of subjectivity (intensity), clearer is its polarity.
- Analyses of the distribution of individual words in subjective expressions showed that 38% of the words are ambiguous, i.e. they occur in subjective expressions as well as objective usages.
- The words in the subjective expression are distributed in a variety of syntactic categories
- Over 44% of the sentences in the corpus contain mixtures of subjective and objective frames. This justifies the authors’ motivation to perform finer grained distinction at sub sentential level.