Hungarian word sense disambiguated corpus

Veronika Vincze, György Szarvas, Attila Almási, Dóra Szauter, Róbert Ormándi, Richárd Farkas, Csaba Hatvani, J. Csirik

Research output: Conference contribution

Abstract

To create the first Hungarian WSD corpus, 39 suitable word form samples were selected for the purpose of word sense disambiguation. Among others, selection criteria required the given word form to be frequent in Hungarian language usage (frequency rates available in the Hungarian National Corpus (HNC) were used for measurement (Váradi, 2000)), and to have more than one sense considered frequent in usage. HNC and its Heti Világgazdaság (HVG) subcorpus provided the basis for corpus text selection. This way, each sample has a relevant context (the whole HVG article), and information on the lemma, POS-tagging and automatic tokenization is also available.

Original languageEnglish
Title of host publicationProceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
PublisherEuropean Language Resources Association (ELRA)
Pages3344-3349
Number of pages6
ISBN (Electronic)2951740840, 9782951740846
Publication statusPublished - jan. 1 2008
Event6th International Conference on Language Resources and Evaluation, LREC 2008 - Marrakech, Morocco
Duration: máj. 28 2008máj. 30 2008

Other

Other6th International Conference on Language Resources and Evaluation, LREC 2008
CountryMorocco
CityMarrakech
Period5/28/085/30/08

ASJC Scopus subject areas

  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics
  • Education

Fingerprint Dive into the research topics of 'Hungarian word sense disambiguated corpus'. Together they form a unique fingerprint.

  • Cite this

    Vincze, V., Szarvas, G., Almási, A., Szauter, D., Ormándi, R., Farkas, R., Hatvani, C., & Csirik, J. (2008). Hungarian word sense disambiguated corpus. In Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 (pp. 3344-3349). European Language Resources Association (ELRA).