Benchmarks

We evaluated SUPWSD on the evaluation framework of Raganato et al. (2017), which includes five test sets from the Senseval/Semeval series and two training corpus of different size, i.e. SemCor (Miller et al., 1993) and OMSTI (Taghipourand Ng, 2015). As sense inventory, we used WordNet 3.0 (Miller et al., 1990) for all open-class parts of speech. Our system guarantees the state-of-the-art performance in terms of F-Measure, sometimes even outperforming its competitor by a considerable margin.

System Corpus Senseval 2 Senseval 3 SemEval 7 Semeval 13 Semeval 15
SupWSD SemCor 71.3 68.8 60.2 65.8 70.0
SupWSD emb SemCor 72.7 70.6 63.1 66.8 71.8
SupWSD-s emb SemCor 72.2 70.3 63.3 66.1 71.6
IMS SemCor 70.9 69.3 61.3 65.3 69.5
IMS emb SemCor 71.0 69.3 60.9 67.3 71.3
IMS-s emb SemCor 72.2 70.4 62.6 65.9 71.5
Context2Vec SemCor 71.8 69.1 61.3 65.6 71.9
MFS SemCor 65.6 66.0 54.5 63.8 67.1
SupWSD SemCor + OMSTI 72.6 68.9 59.6 64.9 69.5
SupWSD emb SemCor + OMSTI 73.8 70.8 64.2 67.2 71.5
SupWSD-s emb SemCor + OMSTI 73.1 70.5 62.2 66.4 70.9
IMS SemCor + OMSTI 72.8 69.2 60.0 65.0 69.3
IMS emb SemCor + OMSTI 70.8 68.9 58.5 66.3 69.7
IMS-s emb SemCor + OMSTI 73.3 69.6 61.1 66.7 70.4
Context2Vec SemCor + OMSTI 72.3 68.2 61.5 67.2 71.7
MFS SemCor + OMSTI 66.5 60.4 52.3 62.6 64.2

Speed Comparison

We additionally carried out an experimental evaluation on the performance of SupWSD in terms of execution time. We relied on a testing corpus with 1M words and more than 250K target instances to disambiguate, and we used both frameworks on SemCor and OMSTI as training sets. Results show a considerable gain in execution time achieved by SupWSD, which is around 3 times faster on Semcor, and almost 6 times faster on OMSTI than other systems.


Training SemCor
-66%
Training OMSTI
-83%
Testing
-80%
1000s 2000s 3000s
  • SupWSD
  • Other systems