The set of features defined for the training of the system is described in Figure 9 and is based on the features described by Ng1996 and Escudero2000ecml. These features represent words, collocations, and POS tags in the local context. Both ``collapsed'' and ``non-collapsed'' functions are used.
Actually, each item in Figure 9 groups several sets
of features. The majority of them depend on the nearest words
(e.g.,
comprises all possible features defined by the words
occurring in each sample at positions
,
,
,
,
,
related to the ambiguous
word). Types nominated with capital letters are based on the
``collapsed'' function form; that is, these features simply
recognize an attribute belonging to the training data.
Keyword features (
m) are inspired by Ng1996 work.
Noun filtering is done using frequency information for nouns
co-occurring with a particular sense. For example, let us suppose
for a set of 100 examples of interest#4: if the
noun bank is found 10 times or more at any position, then a
feature is defined.
Moreover, new features have also been defined using other
grammatical properties: relationship features (
) that refer to
the grammatical relationship of the ambiguous word
(subject, object, complement, ...) and
dependency features (
and
) that extract the word related to the
ambiguous one through the dependency parse tree.