Explaining unintelligible words by means of their context

Balázs Pinter, Gyula Vörös, Zoltán Szabó, András Lorincz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Explaining unintelligible words is a practical problem for text obtained by optical character recognition, from the Web (e.g., because of misspellings), etc. Approaches to wikification, to enriching text by linking words to Wikipedia articles, could help solve this problem. However, existing methods for wikification assume that the text is correct, so they are not capable of wikifying erroneous text. Because of errors, the problem of disambiguation (identifying the appropriate article to link to) becomes large-scale: as the word to be disambiguated is unknown, the article to link to has to be selected from among hundreds, maybe thousands of candidate articles. Existing approaches for the case where the word is known build upon the distributional hypothesis: words that occur in the same contexts tend to have similar meanings. The increased number of candidate articles makes the difficulty of spuriously similar contexts (when two contexts are similar but belong to different articles) more severe. We propose a method to overcome this difficulty by combining the distributional hypothesis with structured sparsity, a rapidly expanding area of research. Empirically, our approach based on structured sparsity compares favorably to various traditional classification methods.

Original languageEnglish
Title of host publicationICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods
Pages382-387
Number of pages6
Publication statusPublished - May 27 2013
Event2nd International Conference on Pattern Recognition Applications and Methods, ICPRAM 2013 - Barcelona, Spain
Duration: Feb 15 2013Feb 18 2013

Publication series

NameICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods

Other

Other2nd International Conference on Pattern Recognition Applications and Methods, ICPRAM 2013
CountrySpain
CityBarcelona
Period2/15/132/18/13

    Fingerprint

Keywords

  • Link disambiguation
  • Natural language processing
  • Structured sparse coding
  • Unintelligible words
  • Wikification

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Cite this

Pinter, B., Vörös, G., Szabó, Z., & Lorincz, A. (2013). Explaining unintelligible words by means of their context. In ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods (pp. 382-387). (ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods).