Application of different learning methods to hungarian part-of-speeeh tagging

Tamás Horváth, Zoltán Alexin, T. Gyimóthy, Stefan Wrobel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

Prom the point of view of computational linguistics, Hungarian is a difficult language due to its complex grammar and rich morphology. This means that even a common task such as part-of-speech tagging presents a new challenge for learning when looked at for the Hungarian language, especially given the fact that this language has fairly free word order. In this paper we therefore present a case study designed to illustrate the potential and limits of current ILP and non-ILP algorithms on the Hungarian POS-tagging task. We have selected the popular C4.5 and Progol systems as propositional and ILP representatives, adding experiments with our own methods AGLEARN, a C4.5 preprocessor based on attribute grammars, and the ILP approaches PHM and RIBL. The systems were compared on the Hungarian version of the multilingual morphosyntactically annotated MULTEXT-East TELRI corpus which consists of about 100.000 tokens. Experimental results indicate that Hungarian POS-tagging is indeed a challenging task for learning algorithms, that even simple background knowledge leads to large differences in accuracy, and that instance-based methods are promising approaches to POS tagging also for Hungarian. The paper also includes experiments with some different cascade connections of the taggers.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages128-139
Number of pages12
Volume1634
ISBN (Print)3540661093, 9783540661092
Publication statusPublished - 1999
Event9th International Workshop on Inductive Logic Programming, ILP 1999 - Bled, Slovenia
Duration: Jun 24 1999Jun 27 1999

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1634
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other9th International Workshop on Inductive Logic Programming, ILP 1999
CountrySlovenia
CityBled
Period6/24/996/27/99

Fingerprint

Inductive logic programming (ILP)
Tagging
Cascade connections
Computational linguistics
Attribute Grammars
Computational Linguistics
Learning algorithms
Experiments
Grammar
Cascade
Experiment
Learning Algorithm
Learning
Experimental Results
Language

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Horváth, T., Alexin, Z., Gyimóthy, T., & Wrobel, S. (1999). Application of different learning methods to hungarian part-of-speeeh tagging. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1634, pp. 128-139). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1634). Springer Verlag.

Application of different learning methods to hungarian part-of-speeeh tagging. / Horváth, Tamás; Alexin, Zoltán; Gyimóthy, T.; Wrobel, Stefan.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1634 Springer Verlag, 1999. p. 128-139 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1634).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Horváth, T, Alexin, Z, Gyimóthy, T & Wrobel, S 1999, Application of different learning methods to hungarian part-of-speeeh tagging. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 1634, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1634, Springer Verlag, pp. 128-139, 9th International Workshop on Inductive Logic Programming, ILP 1999, Bled, Slovenia, 6/24/99.
Horváth T, Alexin Z, Gyimóthy T, Wrobel S. Application of different learning methods to hungarian part-of-speeeh tagging. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1634. Springer Verlag. 1999. p. 128-139. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Horváth, Tamás ; Alexin, Zoltán ; Gyimóthy, T. ; Wrobel, Stefan. / Application of different learning methods to hungarian part-of-speeeh tagging. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1634 Springer Verlag, 1999. pp. 128-139 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{3e83087bdc5943afaaeb76190c8c9abe,
title = "Application of different learning methods to hungarian part-of-speeeh tagging",
abstract = "Prom the point of view of computational linguistics, Hungarian is a difficult language due to its complex grammar and rich morphology. This means that even a common task such as part-of-speech tagging presents a new challenge for learning when looked at for the Hungarian language, especially given the fact that this language has fairly free word order. In this paper we therefore present a case study designed to illustrate the potential and limits of current ILP and non-ILP algorithms on the Hungarian POS-tagging task. We have selected the popular C4.5 and Progol systems as propositional and ILP representatives, adding experiments with our own methods AGLEARN, a C4.5 preprocessor based on attribute grammars, and the ILP approaches PHM and RIBL. The systems were compared on the Hungarian version of the multilingual morphosyntactically annotated MULTEXT-East TELRI corpus which consists of about 100.000 tokens. Experimental results indicate that Hungarian POS-tagging is indeed a challenging task for learning algorithms, that even simple background knowledge leads to large differences in accuracy, and that instance-based methods are promising approaches to POS tagging also for Hungarian. The paper also includes experiments with some different cascade connections of the taggers.",
author = "Tam{\'a}s Horv{\'a}th and Zolt{\'a}n Alexin and T. Gyim{\'o}thy and Stefan Wrobel",
year = "1999",
language = "English",
isbn = "3540661093",
volume = "1634",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "128--139",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Application of different learning methods to hungarian part-of-speeeh tagging

AU - Horváth, Tamás

AU - Alexin, Zoltán

AU - Gyimóthy, T.

AU - Wrobel, Stefan

PY - 1999

Y1 - 1999

N2 - Prom the point of view of computational linguistics, Hungarian is a difficult language due to its complex grammar and rich morphology. This means that even a common task such as part-of-speech tagging presents a new challenge for learning when looked at for the Hungarian language, especially given the fact that this language has fairly free word order. In this paper we therefore present a case study designed to illustrate the potential and limits of current ILP and non-ILP algorithms on the Hungarian POS-tagging task. We have selected the popular C4.5 and Progol systems as propositional and ILP representatives, adding experiments with our own methods AGLEARN, a C4.5 preprocessor based on attribute grammars, and the ILP approaches PHM and RIBL. The systems were compared on the Hungarian version of the multilingual morphosyntactically annotated MULTEXT-East TELRI corpus which consists of about 100.000 tokens. Experimental results indicate that Hungarian POS-tagging is indeed a challenging task for learning algorithms, that even simple background knowledge leads to large differences in accuracy, and that instance-based methods are promising approaches to POS tagging also for Hungarian. The paper also includes experiments with some different cascade connections of the taggers.

AB - Prom the point of view of computational linguistics, Hungarian is a difficult language due to its complex grammar and rich morphology. This means that even a common task such as part-of-speech tagging presents a new challenge for learning when looked at for the Hungarian language, especially given the fact that this language has fairly free word order. In this paper we therefore present a case study designed to illustrate the potential and limits of current ILP and non-ILP algorithms on the Hungarian POS-tagging task. We have selected the popular C4.5 and Progol systems as propositional and ILP representatives, adding experiments with our own methods AGLEARN, a C4.5 preprocessor based on attribute grammars, and the ILP approaches PHM and RIBL. The systems were compared on the Hungarian version of the multilingual morphosyntactically annotated MULTEXT-East TELRI corpus which consists of about 100.000 tokens. Experimental results indicate that Hungarian POS-tagging is indeed a challenging task for learning algorithms, that even simple background knowledge leads to large differences in accuracy, and that instance-based methods are promising approaches to POS tagging also for Hungarian. The paper also includes experiments with some different cascade connections of the taggers.

UR - http://www.scopus.com/inward/record.url?scp=84864361764&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84864361764&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84864361764

SN - 3540661093

SN - 9783540661092

VL - 1634

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 128

EP - 139

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -