Wikifying novel words to mixtures of Wikipedia senses by structured sparse coding

BaláZs Pintér, Gyula Vörös, ZoltáN Szabó, A. Lőrincz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

We extend the scope of Wikification to novel words by relaxing two premises of Wikification: (i) we wikify without using the surface form of the word (ii) to a mixture of Wikipedia senses instead of a single sense. We identify two types of “novel” words: words where the connection between their surface form and their meaning is broken (e.g., a misspelled word), and words where there is no meaning to connect to—the meaning itself is also novel. We propose a method capable of wikifying both types of novel words while also dealing with the inherently large-scale disambiguation problem. We show that the method can disambiguate between up to 1,000 Wikipedia senses, and it can explain words with novel meaning as a mixture of other, possibly related senses. This mixture representation compares favorably to the widely used bag of words representation.

Original languageEnglish
Title of host publicationPattern Recognition Applications and Methods - International Conference, ICPRAM 2013, Revised Selected Papers
PublisherSpringer Verlag
Pages241-255
Number of pages15
Volume318
ISBN (Print)9783319126098
DOIs
Publication statusPublished - 2015
Event2nd International Conference on Pattern Recognition, ICPRAM 2013 - Barcelona, Spain
Duration: Feb 15 2013Feb 18 2013

Publication series

NameAdvances in Intelligent Systems and Computing
Volume318
ISSN (Print)21945357

Other

Other2nd International Conference on Pattern Recognition, ICPRAM 2013
CountrySpain
CityBarcelona
Period2/15/132/18/13

Keywords

  • Interpreting novel words
  • Link disambiguation
  • Natural language processing
  • Structured sparse coding
  • Wikification

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Computer Science(all)

Cite this

Pintér, B., Vörös, G., Szabó, Z., & Lőrincz, A. (2015). Wikifying novel words to mixtures of Wikipedia senses by structured sparse coding. In Pattern Recognition Applications and Methods - International Conference, ICPRAM 2013, Revised Selected Papers (Vol. 318, pp. 241-255). (Advances in Intelligent Systems and Computing; Vol. 318). Springer Verlag. https://doi.org/10.1007/978-3-319-12610-4_15

Wikifying novel words to mixtures of Wikipedia senses by structured sparse coding. / Pintér, BaláZs; Vörös, Gyula; Szabó, ZoltáN; Lőrincz, A.

Pattern Recognition Applications and Methods - International Conference, ICPRAM 2013, Revised Selected Papers. Vol. 318 Springer Verlag, 2015. p. 241-255 (Advances in Intelligent Systems and Computing; Vol. 318).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pintér, B, Vörös, G, Szabó, Z & Lőrincz, A 2015, Wikifying novel words to mixtures of Wikipedia senses by structured sparse coding. in Pattern Recognition Applications and Methods - International Conference, ICPRAM 2013, Revised Selected Papers. vol. 318, Advances in Intelligent Systems and Computing, vol. 318, Springer Verlag, pp. 241-255, 2nd International Conference on Pattern Recognition, ICPRAM 2013, Barcelona, Spain, 2/15/13. https://doi.org/10.1007/978-3-319-12610-4_15
Pintér B, Vörös G, Szabó Z, Lőrincz A. Wikifying novel words to mixtures of Wikipedia senses by structured sparse coding. In Pattern Recognition Applications and Methods - International Conference, ICPRAM 2013, Revised Selected Papers. Vol. 318. Springer Verlag. 2015. p. 241-255. (Advances in Intelligent Systems and Computing). https://doi.org/10.1007/978-3-319-12610-4_15
Pintér, BaláZs ; Vörös, Gyula ; Szabó, ZoltáN ; Lőrincz, A. / Wikifying novel words to mixtures of Wikipedia senses by structured sparse coding. Pattern Recognition Applications and Methods - International Conference, ICPRAM 2013, Revised Selected Papers. Vol. 318 Springer Verlag, 2015. pp. 241-255 (Advances in Intelligent Systems and Computing).
@inproceedings{b382f994356b40e29209a51d53b37d13,
title = "Wikifying novel words to mixtures of Wikipedia senses by structured sparse coding",
abstract = "We extend the scope of Wikification to novel words by relaxing two premises of Wikification: (i) we wikify without using the surface form of the word (ii) to a mixture of Wikipedia senses instead of a single sense. We identify two types of “novel” words: words where the connection between their surface form and their meaning is broken (e.g., a misspelled word), and words where there is no meaning to connect to—the meaning itself is also novel. We propose a method capable of wikifying both types of novel words while also dealing with the inherently large-scale disambiguation problem. We show that the method can disambiguate between up to 1,000 Wikipedia senses, and it can explain words with novel meaning as a mixture of other, possibly related senses. This mixture representation compares favorably to the widely used bag of words representation.",
keywords = "Interpreting novel words, Link disambiguation, Natural language processing, Structured sparse coding, Wikification",
author = "Bal{\'a}Zs Pint{\'e}r and Gyula V{\"o}r{\"o}s and Zolt{\'a}N Szab{\'o} and A. Lőrincz",
year = "2015",
doi = "10.1007/978-3-319-12610-4_15",
language = "English",
isbn = "9783319126098",
volume = "318",
series = "Advances in Intelligent Systems and Computing",
publisher = "Springer Verlag",
pages = "241--255",
booktitle = "Pattern Recognition Applications and Methods - International Conference, ICPRAM 2013, Revised Selected Papers",

}

TY - GEN

T1 - Wikifying novel words to mixtures of Wikipedia senses by structured sparse coding

AU - Pintér, BaláZs

AU - Vörös, Gyula

AU - Szabó, ZoltáN

AU - Lőrincz, A.

PY - 2015

Y1 - 2015

N2 - We extend the scope of Wikification to novel words by relaxing two premises of Wikification: (i) we wikify without using the surface form of the word (ii) to a mixture of Wikipedia senses instead of a single sense. We identify two types of “novel” words: words where the connection between their surface form and their meaning is broken (e.g., a misspelled word), and words where there is no meaning to connect to—the meaning itself is also novel. We propose a method capable of wikifying both types of novel words while also dealing with the inherently large-scale disambiguation problem. We show that the method can disambiguate between up to 1,000 Wikipedia senses, and it can explain words with novel meaning as a mixture of other, possibly related senses. This mixture representation compares favorably to the widely used bag of words representation.

AB - We extend the scope of Wikification to novel words by relaxing two premises of Wikification: (i) we wikify without using the surface form of the word (ii) to a mixture of Wikipedia senses instead of a single sense. We identify two types of “novel” words: words where the connection between their surface form and their meaning is broken (e.g., a misspelled word), and words where there is no meaning to connect to—the meaning itself is also novel. We propose a method capable of wikifying both types of novel words while also dealing with the inherently large-scale disambiguation problem. We show that the method can disambiguate between up to 1,000 Wikipedia senses, and it can explain words with novel meaning as a mixture of other, possibly related senses. This mixture representation compares favorably to the widely used bag of words representation.

KW - Interpreting novel words

KW - Link disambiguation

KW - Natural language processing

KW - Structured sparse coding

KW - Wikification

UR - http://www.scopus.com/inward/record.url?scp=84914145564&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84914145564&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-12610-4_15

DO - 10.1007/978-3-319-12610-4_15

M3 - Conference contribution

AN - SCOPUS:84914145564

SN - 9783319126098

VL - 318

T3 - Advances in Intelligent Systems and Computing

SP - 241

EP - 255

BT - Pattern Recognition Applications and Methods - International Conference, ICPRAM 2013, Revised Selected Papers

PB - Springer Verlag

ER -