Seki et al, NTCIR 2006
From ScribbleWiki: Analysis of Social Media
Contents |
Overview of Opinion Analysis Pilot Task at NTCIR-6
Summary
This paper presents an overview of the Opinion Analysis Pilot Task held at NTCIR-6 (2006-2007), which is a task-based evaluation task like TREC.
In the task, they analyzed and evaluated following four aspects.
Given a sentence,
- Does it express an opinion? - binary classification of opinionated sentence
- Is it positive, negative or neutral statement? - polarity analysis
- Who expresses the opinon? - opinion holder extraction
- Is it relevant to the document set topic? - binary classification of relevance between topic and sentence
Test collection
Language | Corpus | Topics | Documents | Sentences | Opinionated | Relevant |
---|---|---|---|---|---|---|
Chinese | 1998-1999 United Daily News, China Times etc | 32 | 843 | 11,907 | 62% / 25 % | 39% / 16 % |
Japanese | 1998-1999 Yomiuri and Mainichi | 30 | 490 | 15,279 | 29% / 22 % | 64% / 49 % |
English | 1998-1999 Mainichi Daily News, Korea Times etc | 28 | 439 | 8,528 | 30% / 7 % | 69% / 37 % |
The percentage opinionated and relevant are computed over sentences in both the lenient and strict standards based on the number of inter-annotater agreement.
Evaluation Metrics
Precision, Recall and F-Measure over opinionated, relevant and polarity. Semi-automatic evaluation of opinion holders (P, R, F)
Observations
For each Chinese, Japanese and English subtask, there were 5, 3 and 6 participants respectively. Performance across subtasks varies greatly. Considering the result of a system participated in multiple subtasks, there seems to be a strong relationship between quality of annotation (i.e. measured in inter-annotater agreement) and the system performance.
Next task in NTCIR-7
MOAT (Multilignual Opinion Analysis Task) at NTCIR-7 will start soon. Read Call For Participation for more detail. The registration due is Dec 27, 2007.