Comparison of the Best Parameter Settings in the Creation and Comparison of Feature Vectors in Distributional Semantic Models Across Multiple Languages

András Dobó, J. Csirik

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Measuring the semantic similarity and relatedness of words is important for many natural language processing tasks. Although distributional semantic models designed for this task have many different parameters, such as vector similarity measures, weighting schemes and dimensionality reduction techniques, there is no truly comprehensive study simultaneously evaluating these parameters while also analysing the differences in the findings for multiple languages. We would like to address this gap with our systematic study by searching for the best combination of parameter settings in the creation and comparison of feature vectors in distributional semantic models for English, Spanish and Hungarian separately, and then comparing our findings across these languages. During our extensive analysis we test a large number of possible settings for all parameters, with more than a thousand novel variants in case of some of them. As a result of this we were able to find such combinations of parameter settings that significantly outperform conventional settings combinations and achieve state-of-the-art results.

Original languageEnglish
Title of host publicationArtificial Intelligence Applications and Innovations - 15th IFIP WG 12.5 International Conference, AIAI 2019, Proceedings
EditorsIlias Maglogiannis, Elias Pimenidis, John MacIntyre, Lazaros Iliadis
PublisherSpringer New York LLC
Pages487-499
Number of pages13
ISBN (Print)9783030198220
DOIs
Publication statusPublished - Jan 1 2019
Event15th IFIP WG 12.5 International Conference on Artificial Intelligence Applications and Innovations, AIAI 2019 - Hersonissos, Greece
Duration: May 24 2019May 26 2019

Publication series

NameIFIP Advances in Information and Communication Technology
Volume559
ISSN (Print)1868-4238

Conference

Conference15th IFIP WG 12.5 International Conference on Artificial Intelligence Applications and Innovations, AIAI 2019
CountryGreece
CityHersonissos
Period5/24/195/26/19

Fingerprint

Language
Similarity measure
Weighting
Natural language processing
Dimensionality reduction
Semantic similarity

Keywords

  • Best combination of parameter settings
  • Comparison of findings across languages
  • Distributional semantic models
  • English, Spanish and Hungarian
  • Semantic similarity and relatedness

ASJC Scopus subject areas

  • Information Systems and Management

Cite this

Dobó, A., & Csirik, J. (2019). Comparison of the Best Parameter Settings in the Creation and Comparison of Feature Vectors in Distributional Semantic Models Across Multiple Languages. In I. Maglogiannis, E. Pimenidis, J. MacIntyre, & L. Iliadis (Eds.), Artificial Intelligence Applications and Innovations - 15th IFIP WG 12.5 International Conference, AIAI 2019, Proceedings (pp. 487-499). (IFIP Advances in Information and Communication Technology; Vol. 559). Springer New York LLC. https://doi.org/10.1007/978-3-030-19823-7_41

Comparison of the Best Parameter Settings in the Creation and Comparison of Feature Vectors in Distributional Semantic Models Across Multiple Languages. / Dobó, András; Csirik, J.

Artificial Intelligence Applications and Innovations - 15th IFIP WG 12.5 International Conference, AIAI 2019, Proceedings. ed. / Ilias Maglogiannis; Elias Pimenidis; John MacIntyre; Lazaros Iliadis. Springer New York LLC, 2019. p. 487-499 (IFIP Advances in Information and Communication Technology; Vol. 559).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dobó, A & Csirik, J 2019, Comparison of the Best Parameter Settings in the Creation and Comparison of Feature Vectors in Distributional Semantic Models Across Multiple Languages. in I Maglogiannis, E Pimenidis, J MacIntyre & L Iliadis (eds), Artificial Intelligence Applications and Innovations - 15th IFIP WG 12.5 International Conference, AIAI 2019, Proceedings. IFIP Advances in Information and Communication Technology, vol. 559, Springer New York LLC, pp. 487-499, 15th IFIP WG 12.5 International Conference on Artificial Intelligence Applications and Innovations, AIAI 2019, Hersonissos, Greece, 5/24/19. https://doi.org/10.1007/978-3-030-19823-7_41
Dobó A, Csirik J. Comparison of the Best Parameter Settings in the Creation and Comparison of Feature Vectors in Distributional Semantic Models Across Multiple Languages. In Maglogiannis I, Pimenidis E, MacIntyre J, Iliadis L, editors, Artificial Intelligence Applications and Innovations - 15th IFIP WG 12.5 International Conference, AIAI 2019, Proceedings. Springer New York LLC. 2019. p. 487-499. (IFIP Advances in Information and Communication Technology). https://doi.org/10.1007/978-3-030-19823-7_41
Dobó, András ; Csirik, J. / Comparison of the Best Parameter Settings in the Creation and Comparison of Feature Vectors in Distributional Semantic Models Across Multiple Languages. Artificial Intelligence Applications and Innovations - 15th IFIP WG 12.5 International Conference, AIAI 2019, Proceedings. editor / Ilias Maglogiannis ; Elias Pimenidis ; John MacIntyre ; Lazaros Iliadis. Springer New York LLC, 2019. pp. 487-499 (IFIP Advances in Information and Communication Technology).
@inproceedings{5909ad24907a43cfaf9c05233c3fef4f,
title = "Comparison of the Best Parameter Settings in the Creation and Comparison of Feature Vectors in Distributional Semantic Models Across Multiple Languages",
abstract = "Measuring the semantic similarity and relatedness of words is important for many natural language processing tasks. Although distributional semantic models designed for this task have many different parameters, such as vector similarity measures, weighting schemes and dimensionality reduction techniques, there is no truly comprehensive study simultaneously evaluating these parameters while also analysing the differences in the findings for multiple languages. We would like to address this gap with our systematic study by searching for the best combination of parameter settings in the creation and comparison of feature vectors in distributional semantic models for English, Spanish and Hungarian separately, and then comparing our findings across these languages. During our extensive analysis we test a large number of possible settings for all parameters, with more than a thousand novel variants in case of some of them. As a result of this we were able to find such combinations of parameter settings that significantly outperform conventional settings combinations and achieve state-of-the-art results.",
keywords = "Best combination of parameter settings, Comparison of findings across languages, Distributional semantic models, English, Spanish and Hungarian, Semantic similarity and relatedness",
author = "Andr{\'a}s Dob{\'o} and J. Csirik",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-3-030-19823-7_41",
language = "English",
isbn = "9783030198220",
series = "IFIP Advances in Information and Communication Technology",
publisher = "Springer New York LLC",
pages = "487--499",
editor = "Ilias Maglogiannis and Elias Pimenidis and John MacIntyre and Lazaros Iliadis",
booktitle = "Artificial Intelligence Applications and Innovations - 15th IFIP WG 12.5 International Conference, AIAI 2019, Proceedings",

}

TY - GEN

T1 - Comparison of the Best Parameter Settings in the Creation and Comparison of Feature Vectors in Distributional Semantic Models Across Multiple Languages

AU - Dobó, András

AU - Csirik, J.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Measuring the semantic similarity and relatedness of words is important for many natural language processing tasks. Although distributional semantic models designed for this task have many different parameters, such as vector similarity measures, weighting schemes and dimensionality reduction techniques, there is no truly comprehensive study simultaneously evaluating these parameters while also analysing the differences in the findings for multiple languages. We would like to address this gap with our systematic study by searching for the best combination of parameter settings in the creation and comparison of feature vectors in distributional semantic models for English, Spanish and Hungarian separately, and then comparing our findings across these languages. During our extensive analysis we test a large number of possible settings for all parameters, with more than a thousand novel variants in case of some of them. As a result of this we were able to find such combinations of parameter settings that significantly outperform conventional settings combinations and achieve state-of-the-art results.

AB - Measuring the semantic similarity and relatedness of words is important for many natural language processing tasks. Although distributional semantic models designed for this task have many different parameters, such as vector similarity measures, weighting schemes and dimensionality reduction techniques, there is no truly comprehensive study simultaneously evaluating these parameters while also analysing the differences in the findings for multiple languages. We would like to address this gap with our systematic study by searching for the best combination of parameter settings in the creation and comparison of feature vectors in distributional semantic models for English, Spanish and Hungarian separately, and then comparing our findings across these languages. During our extensive analysis we test a large number of possible settings for all parameters, with more than a thousand novel variants in case of some of them. As a result of this we were able to find such combinations of parameter settings that significantly outperform conventional settings combinations and achieve state-of-the-art results.

KW - Best combination of parameter settings

KW - Comparison of findings across languages

KW - Distributional semantic models

KW - English, Spanish and Hungarian

KW - Semantic similarity and relatedness

UR - http://www.scopus.com/inward/record.url?scp=85065912945&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065912945&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-19823-7_41

DO - 10.1007/978-3-030-19823-7_41

M3 - Conference contribution

SN - 9783030198220

T3 - IFIP Advances in Information and Communication Technology

SP - 487

EP - 499

BT - Artificial Intelligence Applications and Innovations - 15th IFIP WG 12.5 International Conference, AIAI 2019, Proceedings

A2 - Maglogiannis, Ilias

A2 - Pimenidis, Elias

A2 - MacIntyre, John

A2 - Iliadis, Lazaros

PB - Springer New York LLC

ER -