A Comprehensive Study of the Parameters in the Creation and Comparison of Feature Vectors in Distributional Semantic Models

András Dobó, J. Csirik

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Measuring the semantic similarity and relatedness of words can play a vital role in many natural language processing tasks. Distributional semantic models computing these measures can have many different parameters, such as different weighting schemes, vector similarity measures, feature transformation functions and dimensionality reduction techniques. Despite their importance there is no truly comprehensive study simultaneously evaluating the numerous parameters of such models, while also considering the interaction of these parameters with each other. We would like to address this gap with our systematic study. Taking the necessary distributional information extracted from the chosen dataset as already granted, we evaluate all important aspects of the creation and comparison of feature vectors in distributional semantic models. Testing altogether 10 parameters simultaneously, we try to find the best combination of parameter settings, with a large number of settings examined in case of some of the parameters. Beside evaluating the conventionally used settings for the parameters, we also propose numerous novel variants, as well as novel combinations of parameter settings, some of which significantly outperform the combinations of settings in general use, thus achieving state-of-the-art results.

Original languageEnglish
JournalJournal of Quantitative Linguistics
DOIs
Publication statusPublished - Jan 1 2019

Fingerprint

semantics
weighting
interaction
language

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

@article{4a1dfc4d3a4f4ff4b9003e39736de0b6,
title = "A Comprehensive Study of the Parameters in the Creation and Comparison of Feature Vectors in Distributional Semantic Models",
abstract = "Measuring the semantic similarity and relatedness of words can play a vital role in many natural language processing tasks. Distributional semantic models computing these measures can have many different parameters, such as different weighting schemes, vector similarity measures, feature transformation functions and dimensionality reduction techniques. Despite their importance there is no truly comprehensive study simultaneously evaluating the numerous parameters of such models, while also considering the interaction of these parameters with each other. We would like to address this gap with our systematic study. Taking the necessary distributional information extracted from the chosen dataset as already granted, we evaluate all important aspects of the creation and comparison of feature vectors in distributional semantic models. Testing altogether 10 parameters simultaneously, we try to find the best combination of parameter settings, with a large number of settings examined in case of some of the parameters. Beside evaluating the conventionally used settings for the parameters, we also propose numerous novel variants, as well as novel combinations of parameter settings, some of which significantly outperform the combinations of settings in general use, thus achieving state-of-the-art results.",
author = "Andr{\'a}s Dob{\'o} and J. Csirik",
year = "2019",
month = "1",
day = "1",
doi = "10.1080/09296174.2019.1570897",
language = "English",
journal = "Journal of Quantitative Linguistics",
issn = "0929-6174",
publisher = "Routledge",

}

TY - JOUR

T1 - A Comprehensive Study of the Parameters in the Creation and Comparison of Feature Vectors in Distributional Semantic Models

AU - Dobó, András

AU - Csirik, J.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Measuring the semantic similarity and relatedness of words can play a vital role in many natural language processing tasks. Distributional semantic models computing these measures can have many different parameters, such as different weighting schemes, vector similarity measures, feature transformation functions and dimensionality reduction techniques. Despite their importance there is no truly comprehensive study simultaneously evaluating the numerous parameters of such models, while also considering the interaction of these parameters with each other. We would like to address this gap with our systematic study. Taking the necessary distributional information extracted from the chosen dataset as already granted, we evaluate all important aspects of the creation and comparison of feature vectors in distributional semantic models. Testing altogether 10 parameters simultaneously, we try to find the best combination of parameter settings, with a large number of settings examined in case of some of the parameters. Beside evaluating the conventionally used settings for the parameters, we also propose numerous novel variants, as well as novel combinations of parameter settings, some of which significantly outperform the combinations of settings in general use, thus achieving state-of-the-art results.

AB - Measuring the semantic similarity and relatedness of words can play a vital role in many natural language processing tasks. Distributional semantic models computing these measures can have many different parameters, such as different weighting schemes, vector similarity measures, feature transformation functions and dimensionality reduction techniques. Despite their importance there is no truly comprehensive study simultaneously evaluating the numerous parameters of such models, while also considering the interaction of these parameters with each other. We would like to address this gap with our systematic study. Taking the necessary distributional information extracted from the chosen dataset as already granted, we evaluate all important aspects of the creation and comparison of feature vectors in distributional semantic models. Testing altogether 10 parameters simultaneously, we try to find the best combination of parameter settings, with a large number of settings examined in case of some of the parameters. Beside evaluating the conventionally used settings for the parameters, we also propose numerous novel variants, as well as novel combinations of parameter settings, some of which significantly outperform the combinations of settings in general use, thus achieving state-of-the-art results.

UR - http://www.scopus.com/inward/record.url?scp=85062778329&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062778329&partnerID=8YFLogxK

U2 - 10.1080/09296174.2019.1570897

DO - 10.1080/09296174.2019.1570897

M3 - Article

JO - Journal of Quantitative Linguistics

JF - Journal of Quantitative Linguistics

SN - 0929-6174

ER -