A Comprehensive Study of the Parameters in the Creation and Comparison of Feature Vectors in Distributional Semantic Models

András Dobó, János Csirik

Research output: Contribution to journalArticle

1 Citation (Scopus)


Measuring the semantic similarity and relatedness of words can play a vital role in many natural language processing tasks. Distributional semantic models computing these measures can have many different parameters, such as different weighting schemes, vector similarity measures, feature transformation functions and dimensionality reduction techniques. Despite their importance there is no truly comprehensive study simultaneously evaluating the numerous parameters of such models, while also considering the interaction of these parameters with each other. We would like to address this gap with our systematic study. Taking the necessary distributional information extracted from the chosen dataset as already granted, we evaluate all important aspects of the creation and comparison of feature vectors in distributional semantic models. Testing altogether 10 parameters simultaneously, we try to find the best combination of parameter settings, with a large number of settings examined in case of some of the parameters. Beside evaluating the conventionally used settings for the parameters, we also propose numerous novel variants, as well as novel combinations of parameter settings, some of which significantly outperform the combinations of settings in general use, thus achieving state-of-the-art results.

Original languageEnglish
JournalJournal of Quantitative Linguistics
Publication statusPublished - Jan 1 2019


ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this