Integrating a Corpus-based WSD System into a Knowledge-based WSD System

This experiment was designed to study and evaluate whether the integration of corpus-based system within a knowledge-based helps improve word-sense disambiguation of nouns.

Therefore, ME can help to SM by labelling some nouns of the context of the target word. That is, reducing the number of possible senses of some nouns of the context. In fact, we reduce the search space of the SM method. This ensures that the sense of the target word will be the one more related to the noun senses labelled by ME.

In this case, we used the noun words from the English lexical-sample task from SENSEVAL-2. ME helps SM by labelling some words from the context of the target word. These words were sense tagged using the SemCor collection as a learning corpus. We performed a three-fold cross-validation for all nouns having 10 or more occurrences. We selected those nouns that were disambiguated by ME with high precision, that is, nouns that had percentage rates of accuracy of 90% or more. The classifiers for these nouns were used to disambiguate the testing data. The total number of different noun classifiers (noun) activated for each target word across the testing corpus is shown in Table 15.

Next, SM was applied, using all the heuristics for disambiguating the target words of the testing data, but with the advantage of knowing the senses of some nouns that formed the context of these targets words.


Table 15: Precision and Recall Results Using SM to Disambiguate Words, With and Without Fixing of Noun Sense
Without fixed senses With fixed senses
Target words noun classifiers Precision Recall Precision Recall
art 63 0.475 0.475 0.524 0.524
authority 80 0.137 0.123 0.144 0.135
bar 104 0.222 0.203 0.232 0.220
bum 37 0.421 0.216 0.421 0.216
chair 59 0.206 0.190 0.316 0.301
channel 32 0.500 0.343 0.521 0.375
child 59 0.500 0.200 0.518 0.233
church 50 0.509 0.509 0.540 0.529
circuit 49 0.356 0.346 0.369 0.360
day 136 0.038 0.035 0.054 0.049
detention 22 0.454 0.454 0.476 0.454
dyke 15 0.933 0.933 0.933 0.933
facility 14 0.875 0.875 1 1
fatigue 38 0.236 0.230 0.297 0.282
feeling 48 0.306 0.300 0.346 0.340
grip 38 0.184 0.179 0.216 0.205
hearth 29 0.321 0.310 0.321 0.310
holiday 23 0.818 0.346 0.833 0.384
lady 40 0.375 0.136 0.615 0.363
material 58 0.343 0.338 0.359 0.353
mouth 51 0.094 0.094 0.132 0.132
nation 25 0.269 0.269 0.307 0.307
nature 37 0.263 0.263 0.289 0.289
post 41 0.312 0.306 0.354 0.346
restraint 31 0.200 0.193 0.206 0.193
sense 37 0.260 0.240 0.282 0.260
spade 17 0.823 0.823 0.941 0.941
stress 37 0.228 0.216 0.257 0.243
yew 24 0.480 0.480 0.541 0.520
Total 1294 0.300 0.267 0.336 0.303


Table 15 shows the results of precision and recall when SM is applied with and without first applying ME, that is, with and without fixing the sense of the nouns that form the context. A very small but consistent improvement was obtained through the complete test set (3.56% precision and 3.61% recall). Although the improvement is very low, this experiment empirically demonstrates that a corpus-based method such as maximum entropy can be integrated to help a knowledge-based system such as the specification marks method.