Hungarian dependency treebank

Veronika Vincze, Dóra Szauter, Attila Almási, György Móra, Zoltán Alexin, J. Csirik

Research output: Conference contribution

37 Citations (Scopus)

Abstract

Herein, we present the process of developing the first Hungarian Dependency TreeBank. First, short references are made to dependency grammars we considered important in the development of our Treebank. Second, mention is made of existing dependency corpora for other languages. Third, we present the steps of converting the Szeged Treebank into dependency-tree format: from the originally phrase-structured treebank, we produced dependency trees by automatic conversion, checked and corrected them thereby creating the first manually annotated dependency corpus for Hungarian. We also go into detail about the two major sets of problems, i.e. coordination and predicative nouns and adjectives. Fourth, we give statistics on the treebank: by now, we have completed the annotation of business news, newspaper articles, legal texts and texts in informatics, at the same time, we are planning to convert the entire corpus into dependency tree format. Finally, we give some hints on the applicability of the system: the present database may be utilized - among others - in information extraction and machine translation as well.

Original languageEnglish
Title of host publicationProceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
PublisherEuropean Language Resources Association (ELRA)
Pages1855-1862
Number of pages8
ISBN (Electronic)2951740867, 9782951740860
Publication statusPublished - jan. 1 2010
Event7th International Conference on Language Resources and Evaluation, LREC 2010 - Valletta, Malta
Duration: máj. 17 2010máj. 23 2010

Other

Other7th International Conference on Language Resources and Evaluation, LREC 2010
CountryMalta
CityValletta
Period5/17/105/23/10

ASJC Scopus subject areas

  • Education
  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics

Fingerprint Dive into the research topics of 'Hungarian dependency treebank'. Together they form a unique fingerprint.

  • Cite this

    Vincze, V., Szauter, D., Almási, A., Móra, G., Alexin, Z., & Csirik, J. (2010). Hungarian dependency treebank. In Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010 (pp. 1855-1862). European Language Resources Association (ELRA).