A hierarchical online classifier for patent categorization

Domonkos Tikk, György Biró, Attila Törcsvári

Research output: Chapter in Book/Report/Conference proceedingChapter

14 Citations (Scopus)

Abstract

Patent categorization (PC) is a typical application area of text categorization (TC). TC can be applied in different scenarios at the work of patent offices depending on at what stage the categorization is needed. This is a challenging field for TC algorithms, since the applications have to deal simultaneously with a large number of categories (in the magnitude of 1,000-10,000) organized in hierarchy, large number of long documents with huge vocabularies at training, and they are required to work fast and accurate at on-the-fly categorization. In this chapter we present a hierarchicalonline classifier, called HITEC, which meets the above requirements. The novelty of the method lies in the taxonomy dependent architecture of the classifier, the applied weight updating scheme, and in the relaxed category selection method. We evaluate the method on two large English patent application databases, the WIPO-alpha and the Espace A/B corpora.1 We also compare the presented method to other TC algorithms on these collections and show that it outperforms them significantly.

Original languageEnglish
Title of host publicationEmerging Technologies of Text Mining
Subtitle of host publicationTechniques and Applications
PublisherIGI Global
Pages244-267
Number of pages24
ISBN (Print)9781599043739
DOIs
Publication statusPublished - Dec 1 2007

ASJC Scopus subject areas

  • Social Sciences(all)

Fingerprint Dive into the research topics of 'A hierarchical online classifier for patent categorization'. Together they form a unique fingerprint.

  • Cite this

    Tikk, D., Biró, G., & Törcsvári, A. (2007). A hierarchical online classifier for patent categorization. In Emerging Technologies of Text Mining: Techniques and Applications (pp. 244-267). IGI Global. https://doi.org/10.4018/978-1-59904-373-9.ch012