POS tagging of Hungarian with combined statistical and rule-based methods

András Kuba, András Hócza, János Csirik

Research output: Contribution to journalConference article

12 Citations (Scopus)


In this paper we will survey the key results achieved so far in Hungarian POS tagging. The most successful approaches have been selected and re-evaluated on a manually annotated corpus containing 1.2 million words. Tests were performed on single-domain, multiple domain and cross-domain test settings. We investigate here the possibilities of further improvement of the selected POS tagging methods by combining them. Our aim is to build a POS tagger that achieves good results on a fine tag set of more than 1000 tags. Results show that rule-based methods - including Transformation Based Learning -can be used as effectively as statistical methods for Hungarian POS tagging. Combined methods do increase the tagging accuracy, producing significantly better results than those published earlier. We also show that the optimal combination differs in the cases of domain specific and general purpose taggers.

Original languageEnglish
Pages (from-to)113-120
Number of pages8
JournalLecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
Publication statusPublished - Dec 1 2004
Event7th International Conference TSD 2004: Text, Speech and Dialogue - Brno, Czech Republic
Duration: Sep 8 2004Sep 11 2004

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'POS tagging of Hungarian with combined statistical and rule-based methods'. Together they form a unique fingerprint.

  • Cite this