Integrating a Corpus-based WSD System into a Knowledge-based WSD System

This experiment was designed to study and evaluate whether the integration of corpus-based system within a knowledge-based helps improve word-sense disambiguation of nouns.

Therefore, ME can help to SM by labelling some nouns of the context of the target word. That is, reducing the number of possible senses of some nouns of the context. In fact, we reduce the search space of the SM method. This ensures that the sense of the target word will be the one more related to the noun senses labelled by ME.

In this case, we used the noun words from the English lexical-sample task from SENSEVAL-2. ME helps SM by labelling some words from the context of the target word. These words were sense tagged using the SemCor collection as a learning corpus. We performed a three-fold cross-validation for all nouns having 10 or more occurrences. We selected those nouns that were disambiguated by ME with high precision, that is, nouns that had percentage rates of accuracy of 90% or more. The classifiers for these nouns were used to disambiguate the testing data. The total number of different noun classifiers (noun) activated for each target word across the testing corpus is shown in Table 15.

Next, SM was applied, using all the heuristics for disambiguating the target words of the testing data, but with the advantage of knowing the senses of some nouns that formed the context of these targets words.

Table 15: Precision and Recall Results Using SM to Disambiguate Words, With and Without Fixing of Noun Sense

		Without fixed senses		With fixed senses
Target words	noun classifiers	Precision	Recall	Precision	Recall
art	63	0.475	0.475	0.524	0.524
authority	80	0.137	0.123	0.144	0.135
bar	104	0.222	0.203	0.232	0.220
bum	37	0.421	0.216	0.421	0.216
chair	59	0.206	0.190	0.316	0.301
channel	32	0.500	0.343	0.521	0.375
child	59	0.500	0.200	0.518	0.233
church	50	0.509	0.509	0.540	0.529
circuit	49	0.356	0.346	0.369	0.360
day	136	0.038	0.035	0.054	0.049
detention	22	0.454	0.454	0.476	0.454
dyke	15	0.933	0.933	0.933	0.933
facility	14	0.875	0.875	1	1
fatigue	38	0.236	0.230	0.297	0.282
feeling	48	0.306	0.300	0.346	0.340
grip	38	0.184	0.179	0.216	0.205
hearth	29	0.321	0.310	0.321	0.310
holiday	23	0.818	0.346	0.833	0.384
lady	40	0.375	0.136	0.615	0.363
material	58	0.343	0.338	0.359	0.353
mouth	51	0.094	0.094	0.132	0.132
nation	25	0.269	0.269	0.307	0.307
nature	37	0.263	0.263	0.289	0.289
post	41	0.312	0.306	0.354	0.346
restraint	31	0.200	0.193	0.206	0.193
sense	37	0.260	0.240	0.282	0.260
spade	17	0.823	0.823	0.941	0.941
stress	37	0.228	0.216	0.257	0.243
yew	24	0.480	0.480	0.541	0.520
Total	1294	0.300	0.267	0.336	0.303

Table 15 shows the results of precision and recall when SM is applied with and without first applying ME, that is, with and without fixing the sense of the nouns that form the context. A very small but consistent improvement was obtained through the complete test set (3.56% precision and 3.61% recall). Although the improvement is very low, this experiment empirically demonstrates that a corpus-based method such as maximum entropy can be integrated to help a knowledge-based system such as the specification marks method.