Specification Marks Method

The underlying hypothesis of this knowledge base method is that the higher the similarity between two words, the larger the amount of information shared by two of its concepts. In this case, the information commonly shared by several concepts is indicated by the most specific concept that subsumes them in the taxonomy.

The input for this WSD module is a group of nouns $W=\{w_{1},
w_{2},..., w_{n}\}$ in a context. Each word $w_{i}$ is sought in WordNet, each having an associated set of possible senses $S_{i}=\{S_{i1},S_{i2},..., S_{in}\}$, and each sense having a set of concepts in the IS-A taxonomy (hypernymy/hyponymy relations). First, this method obtains the common concept to all the senses of the words that form the context. This concept is marked by the initial specification mark (ISM). If this initial specification mark does not resolve the ambiguity of the word, we then descend through the WordNet hierarchy, from one level to another, assigning new specification marks. For each specification mark, the number of concepts contained within the subhierarchy is then counted. The sense that corresponds to the specification mark with the highest number of words is the one chosen to be sense disambiguated within the given context. Figure 1 illustrates graphically how the word plant, having four different senses, is disambiguated in a context that also has the words tree, perennial, and leaf. It can be seen that the initial specification mark does not resolve the lexical ambiguity, since the word plant appears in two subhierarchies with different senses. The specification mark identified by {plant#2, flora#2}, however, contains the highest number of words (three) from the context and will therefore be the one chosen to resolve the sense two of the word plant. The words tree and perennial are also disambiguated, choosing for both the sense one. The word leaf does not appear in the subhierarchy of the specification mark {plant#2, flora#2}, and therefore this word has not been disambiguated. These words are beyond the scope of the disambiguation algorithm. They will be left aside to be processed by a complementary set of heuristics (see section 3.1.2).

Figure 1: Specification Marks
\includegraphics[width=14cm, clip]{wordnet.eps}