Application of different learning methods to hungarian part-of-speeeh tagging

Tamás Horváth, Zoltán Alexin, Tibor Gyimóthy, Stefan Wrobel

Research output: Conference contribution

15 Citations (Scopus)

Abstract

Prom the point of view of computational linguistics, Hungarian is a difficult language due to its complex grammar and rich morphology. This means that even a common task such as part-of-speech tagging presents a new challenge for learning when looked at for the Hungarian language, especially given the fact that this language has fairly free word order. In this paper we therefore present a case study designed to illustrate the potential and limits of current ILP and non-ILP algorithms on the Hungarian POS-tagging task. We have selected the popular C4.5 and Progol systems as propositional and ILP representatives, adding experiments with our own methods AGLEARN, a C4.5 preprocessor based on attribute grammars, and the ILP approaches PHM and RIBL. The systems were compared on the Hungarian version of the multilingual morphosyntactically annotated MULTEXT-East TELRI corpus which consists of about 100.000 tokens. Experimental results indicate that Hungarian POS-tagging is indeed a challenging task for learning algorithms, that even simple background knowledge leads to large differences in accuracy, and that instance-based methods are promising approaches to POS tagging also for Hungarian. The paper also includes experiments with some different cascade connections of the taggers.

Original languageEnglish
Title of host publicationInductive Logic Programming - 9th International Workshop, ILP 1999, Proceedings
EditorsPeter Flach, Saso Dzeroski
PublisherSpringer Verlag
Pages128-139
Number of pages12
ISBN (Print)3540661093, 9783540661092
Publication statusPublished - jan. 1 1999
Event9th International Workshop on Inductive Logic Programming, ILP 1999 - Bled, Slovenia
Duration: jún. 24 1999jún. 27 1999

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1634
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other9th International Workshop on Inductive Logic Programming, ILP 1999
CountrySlovenia
CityBled
Period6/24/996/27/99

    Fingerprint

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Horváth, T., Alexin, Z., Gyimóthy, T., & Wrobel, S. (1999). Application of different learning methods to hungarian part-of-speeeh tagging. In P. Flach, & S. Dzeroski (Eds.), Inductive Logic Programming - 9th International Workshop, ILP 1999, Proceedings (pp. 128-139). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1634). Springer Verlag.