Computing semantic similarity using large static corpora

András Dobó, János Csirik

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

Measuring semantic similarity of words is of crucial importance in Natural Language Processing. Although there are many different approaches for this task, there is still room for improvement. In contrast to many other methods that use web search engines or large lexical databases, we developed such methods that solely rely on large static corpora. They create a binary or numerical feature vector for each word making use of statistical information obtained from the corpora. These vectors contain features based on context words or grammatical relations extracted from the corpora and they employ diverse weighting schemes. After creating the feature vectors, word similarity is calculated using various vector similarity measures. Beside the individual methods, their combinations were also tested. Evaluated on both the Miller-Charles dataset and the TOEFL synonym questions, they achieve competitive results to recent methods.

Original languageEnglish
Title of host publicationSOFSEM 2013
Subtitle of host publicationTheory and Practice of Computer Science - 39th International Conference on Current Trends in Theory and Practice of Computer Science, Proceedings
Pages491-502
Number of pages12
DOIs
Publication statusPublished - Jan 24 2013
Event39th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2013 - Spindleruv Mlyn, Czech Republic
Duration: Jan 26 2013Jan 31 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7741 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other39th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2013
CountryCzech Republic
CitySpindleruv Mlyn
Period1/26/131/31/13

    Fingerprint

Keywords

  • co-occurrence statistics
  • semantic similarity
  • static corpora

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Dobó, A., & Csirik, J. (2013). Computing semantic similarity using large static corpora. In SOFSEM 2013: Theory and Practice of Computer Science - 39th International Conference on Current Trends in Theory and Practice of Computer Science, Proceedings (pp. 491-502). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7741 LNCS). https://doi.org/10.1007/978-3-642-35843-2_42