Explaining unintelligible words by means of their context

Balázs Pinter, Gyula Vörös, Zoltán Szabó, A. Lőrincz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Explaining unintelligible words is a practical problem for text obtained by optical character recognition, from the Web (e.g., because of misspellings), etc. Approaches to wikification, to enriching text by linking words to Wikipedia articles, could help solve this problem. However, existing methods for wikification assume that the text is correct, so they are not capable of wikifying erroneous text. Because of errors, the problem of disambiguation (identifying the appropriate article to link to) becomes large-scale: as the word to be disambiguated is unknown, the article to link to has to be selected from among hundreds, maybe thousands of candidate articles. Existing approaches for the case where the word is known build upon the distributional hypothesis: words that occur in the same contexts tend to have similar meanings. The increased number of candidate articles makes the difficulty of spuriously similar contexts (when two contexts are similar but belong to different articles) more severe. We propose a method to overcome this difficulty by combining the distributional hypothesis with structured sparsity, a rapidly expanding area of research. Empirically, our approach based on structured sparsity compares favorably to various traditional classification methods.

Original languageEnglish
Title of host publicationICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods
Pages382-387
Number of pages6
Publication statusPublished - 2013
Event2nd International Conference on Pattern Recognition Applications and Methods, ICPRAM 2013 - Barcelona, Spain
Duration: Feb 15 2013Feb 18 2013

Other

Other2nd International Conference on Pattern Recognition Applications and Methods, ICPRAM 2013
CountrySpain
CityBarcelona
Period2/15/132/18/13

Fingerprint

Optical character recognition

Keywords

  • Link disambiguation
  • Natural language processing
  • Structured sparse coding
  • Unintelligible words
  • Wikification

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Cite this

Pinter, B., Vörös, G., Szabó, Z., & Lőrincz, A. (2013). Explaining unintelligible words by means of their context. In ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods (pp. 382-387)

Explaining unintelligible words by means of their context. / Pinter, Balázs; Vörös, Gyula; Szabó, Zoltán; Lőrincz, A.

ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods. 2013. p. 382-387.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pinter, B, Vörös, G, Szabó, Z & Lőrincz, A 2013, Explaining unintelligible words by means of their context. in ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods. pp. 382-387, 2nd International Conference on Pattern Recognition Applications and Methods, ICPRAM 2013, Barcelona, Spain, 2/15/13.
Pinter B, Vörös G, Szabó Z, Lőrincz A. Explaining unintelligible words by means of their context. In ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods. 2013. p. 382-387
Pinter, Balázs ; Vörös, Gyula ; Szabó, Zoltán ; Lőrincz, A. / Explaining unintelligible words by means of their context. ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods. 2013. pp. 382-387
@inproceedings{e743dffc99ec4fa691469efb582a4d2f,
title = "Explaining unintelligible words by means of their context",
abstract = "Explaining unintelligible words is a practical problem for text obtained by optical character recognition, from the Web (e.g., because of misspellings), etc. Approaches to wikification, to enriching text by linking words to Wikipedia articles, could help solve this problem. However, existing methods for wikification assume that the text is correct, so they are not capable of wikifying erroneous text. Because of errors, the problem of disambiguation (identifying the appropriate article to link to) becomes large-scale: as the word to be disambiguated is unknown, the article to link to has to be selected from among hundreds, maybe thousands of candidate articles. Existing approaches for the case where the word is known build upon the distributional hypothesis: words that occur in the same contexts tend to have similar meanings. The increased number of candidate articles makes the difficulty of spuriously similar contexts (when two contexts are similar but belong to different articles) more severe. We propose a method to overcome this difficulty by combining the distributional hypothesis with structured sparsity, a rapidly expanding area of research. Empirically, our approach based on structured sparsity compares favorably to various traditional classification methods.",
keywords = "Link disambiguation, Natural language processing, Structured sparse coding, Unintelligible words, Wikification",
author = "Bal{\'a}zs Pinter and Gyula V{\"o}r{\"o}s and Zolt{\'a}n Szab{\'o} and A. Lőrincz",
year = "2013",
language = "English",
isbn = "9789898565419",
pages = "382--387",
booktitle = "ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods",

}

TY - GEN

T1 - Explaining unintelligible words by means of their context

AU - Pinter, Balázs

AU - Vörös, Gyula

AU - Szabó, Zoltán

AU - Lőrincz, A.

PY - 2013

Y1 - 2013

N2 - Explaining unintelligible words is a practical problem for text obtained by optical character recognition, from the Web (e.g., because of misspellings), etc. Approaches to wikification, to enriching text by linking words to Wikipedia articles, could help solve this problem. However, existing methods for wikification assume that the text is correct, so they are not capable of wikifying erroneous text. Because of errors, the problem of disambiguation (identifying the appropriate article to link to) becomes large-scale: as the word to be disambiguated is unknown, the article to link to has to be selected from among hundreds, maybe thousands of candidate articles. Existing approaches for the case where the word is known build upon the distributional hypothesis: words that occur in the same contexts tend to have similar meanings. The increased number of candidate articles makes the difficulty of spuriously similar contexts (when two contexts are similar but belong to different articles) more severe. We propose a method to overcome this difficulty by combining the distributional hypothesis with structured sparsity, a rapidly expanding area of research. Empirically, our approach based on structured sparsity compares favorably to various traditional classification methods.

AB - Explaining unintelligible words is a practical problem for text obtained by optical character recognition, from the Web (e.g., because of misspellings), etc. Approaches to wikification, to enriching text by linking words to Wikipedia articles, could help solve this problem. However, existing methods for wikification assume that the text is correct, so they are not capable of wikifying erroneous text. Because of errors, the problem of disambiguation (identifying the appropriate article to link to) becomes large-scale: as the word to be disambiguated is unknown, the article to link to has to be selected from among hundreds, maybe thousands of candidate articles. Existing approaches for the case where the word is known build upon the distributional hypothesis: words that occur in the same contexts tend to have similar meanings. The increased number of candidate articles makes the difficulty of spuriously similar contexts (when two contexts are similar but belong to different articles) more severe. We propose a method to overcome this difficulty by combining the distributional hypothesis with structured sparsity, a rapidly expanding area of research. Empirically, our approach based on structured sparsity compares favorably to various traditional classification methods.

KW - Link disambiguation

KW - Natural language processing

KW - Structured sparse coding

KW - Unintelligible words

KW - Wikification

UR - http://www.scopus.com/inward/record.url?scp=84877961450&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84877961450&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84877961450

SN - 9789898565419

SP - 382

EP - 387

BT - ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods

ER -